Grouping Alternative Source Terms
The purpose of grouping alternative source terms is to catch as many variations of these terms in your projects as possible. Stemming can only get you so far. Stemming is logical, real language is not. Source-side synonyms allow you to also enter/catch things that might not "make sense", such as:
- misspelled versions of source terms,
- spelling variants (dialog/dialogue, localize/localise, gray/grey, heiss/heiß, waehrend/während etc.),
- different-language versions of source terms (I often add French or German translations of a Dutch term to a source term entry as "synonyms". This way I will catch these too.)
- and things like: schriftelijke waarschuwing / geschreven waarschuwing = written warning (different ways of saying things in your source language are recognised and translated the same way, for consistency reasons)
Chemical substances often have different names: different nomenclatures (systematical names?), trade names, common?? names etc. Here is a nice example from IATE, clearly showing the use that source-side synonyms can have:
Using these synonyms will prevent that different names for the same concept are scattered around in your glossary. Source-side alternatives can create coherence in your glossary. They also can be handy when you translate in both directions of a language pair. When you swap the Look up direction (Edit | Options | Auto-assembling | Automatic glossary look up: Change Left terms to Right terms), the source-side synonyms become alternative target terms.
Multiple source term = multiple target term
Null;Nullpunkt > zero;nil
Auto;Fahrzeug;Wagen > car;auto
Note: Carefully check whether every individual source term corresponds with each and every target term.
Inflected source terms = one (or more) target term
geschriebene;geschriebener;geschriebenen > written
heiß;heißes > hot
gato;gata > cat
Note: Here you can also use the pipe character "|" at the source side to enable stemming. Writing the forms out, however, gives you more control (e.g. to separate between heißes and heißer, that can have different translations).
When you want to use stemming, you could enter:
The recognition and auto-assembling will be exactly the same (as in the example above):
Example for English > French
A nice example can be found here:
The original poster asked: Is there a way to use more than just one source and/or target term in a single Glossary entry?
For example, let's say I have the following English and French terms:
EN: representative;elected official;district representative
FR: élu;représentant élu;député
Those are three English and three French terms which, depending on the context, could all be "synonyms".
Using stemming rather than prefix matching?
In CafeTran "prefix matching" can be used for memories. It is an automatic function that matches the beginnings of all words in a segment. Via stemming, you can also define the root of a word using pipe characters (|).
Enabling stemming
In memories stemming goes along with prefix matching and it has to be activated in the TM start-up options. In tab-delimited glossaries, it works automatically when the program comes across pipe characters in your source term.
Furthermore, stemming in memories can be applied for each word in a segment e.g. fristgerecht|e Kündigung|en, whereas in glossaries the term is treated as a whole so in your example I would put the pipe only in the
first word e.g. fristgerecht|e Kündigung to catch other forms of fristgerechte (fristgerechter, fristgerechten).
Splitting up source-side alternatives
When you want to split up source-side alternatives so that every glossary line only contains one source term at the left side of the tab, you can use this AWK command provided by Mark Jalbert:
awk -F'\t' '{split ( $1, a, ";" ); for (i=1; i<=length(a); i++) print a[i] "\t" $2}' example.txt > output-file.txt
- Place your glossary on your Desktop.
- Start the Terminal.
- Go to the Desktop.
- Execute the AWK command.
Input:
Output:
TIP: Use Excel to swap columns if you want to resolve the right side of the tab too.
Compacting your glossaries with source-side and target-side alternatives
Instead of these 12 glossary entries:
LA | VE |
LA | ventilatie-eenheid |
Lüftungsanlage | ventilatie-eenheid |
Lüftungsanlage | VE |
WLA | RVE |
WLA | residentiële ventilatie-eenheid |
Wohnraumlüftungsanlage | residentiële ventilatie-eenheid |
Wohnraumlüftungsanlage | RVE |
NWLA | NRVE |
NWLA | niet-residentiële ventilatie-eenheid |
Nichtwohnraumlüftungsanlage | niet-residentiële ventilatie-eenheid |
Nichtwohnraumlüftungsanlage | NRVE |
You can compact your glossary to 3 entries like this:
LA;Lüftungsanlage | VE;ventilatie-eenheid |
WLA;Wohnraumlüftungsanlage | residentiële ventilatie-eenheid;RVE |
NWLA;Nichtwohnraumlüftungsanlage | niet-residentiële ventilatie-eenheid;NRVE |
Splitting up source-side and target-side alternatives
perl -CSDA -w «'EOF' - in.txt > out.txt
use strict;
while (<>) {
chomp;
my ($x, $y) = split "\t";
my @xx = split ';', $x;
my @yy = split ';', $y;
for $x (@xx) {
for $y (@yy) {
printf "%s\t%s\n", $x, $y;
}
}
}
EOF