Using source-side synonyms

Start here » Translation process » Terminology » Using Glossaries » Using source-side synonyms

Grouping Alternative Source Terms

The purpose of grouping alternative source terms is to catch as many variations of these terms in your projects as possible. Stemming can only get you so far. Stemming is logical, real language is not. Source-side synonyms allow you to also enter/catch things that might not "make sense", such as:

misspelled versions of source terms,
spelling variants (dialog/dialogue, localize/localise, gray/grey, heiss/heiß, waehrend/während etc.),
different-language versions of source terms (I often add French or German translations of a Dutch term to a source term entry as "synonyms". This way I will catch these too.)
and things like: schriftelijke waarschuwing / geschreven waarschuwing = written warning (different ways of saying things in your source language are recognised and translated the same way, for consistency reasons)

Chemical substances often have different names: different nomenclatures (systematical names?), trade names, common?? names etc. Here is a nice example from IATE, clearly showing the use that source-side synonyms can have:

Using these synonyms will prevent that different names for the same concept are scattered around in your glossary. Source-side alternatives can create coherence in your glossary. They also can be handy when you translate in both directions of a language pair. When you swap the Look up direction (Edit | Options | Auto-assembling | Automatic glossary look up: Change Left terms to Right terms), the source-side synonyms become alternative target terms.

Multiple source term = multiple target term

Null;Nullpunkt > zero;nil
Auto;Fahrzeug;Wagen > car;auto

Note: Carefully check whether every individual source term corresponds with each and every target term.

Inflected source terms = one (or more) target term

geschriebene;geschriebener;geschriebenen > written
heiß;heißes > hot
gato;gata > cat

Note: Here you can also use the pipe character "|" at the source side to enable stemming. Writing the forms out, however, gives you more control (e.g. to separate between heißes and heißer, that can have different translations).

When you want to use stemming, you could enter:

The recognition and auto-assembling will be exactly the same (as in the example above):

Example for English > French

A nice example can be found here:

The original poster asked: Is there a way to use more than just one source and/or target term in a single Glossary entry?
For example, let's say I have the following English and French terms:

EN: representative;elected official;district representative
FR: élu;représentant élu;député

Those are three English and three French terms which, depending on the context, could all be "synonyms".

Using stemming rather than prefix matching?

In CafeTran "prefix matching" can be used for memories. It is an automatic function that matches the beginnings of all words in a segment. Via stemming, you can also define the root of a word using pipe characters (|).

Enabling stemming

In memories stemming goes along with prefix matching and it has to be activated in the TM start-up options. In tab-delimited glossaries, it works automatically when the program comes across pipe characters in your source term.

Furthermore, stemming in memories can be applied for each word in a segment e.g. fristgerecht|e Kündigung|en, whereas in glossaries the term is treated as a whole so in your example I would put the pipe only in the
first word e.g. fristgerecht|e Kündigung to catch other forms of fristgerechte (fristgerechter, fristgerechten).

Splitting up source-side alternatives

When you want to split up source-side alternatives so that every glossary line only contains one source term at the left side of the tab, you can use this AWK command provided by Mark Jalbert:

awk -F'\t' '{split ( $1, a, ";" ); for (i=1; i<=length(a); i++) print a[i] "\t" $2}' example.txt > output-file.txt

Place your glossary on your Desktop.
Start the Terminal.
Go to the Desktop.
Execute the AWK command.

Input:

Output:

TIP: Use Excel to swap columns if you want to resolve the right side of the tab too.

Compacting your glossaries with source-side and target-side alternatives

Instead of these 12 glossary entries:

LA	VE
LA	ventilatie-eenheid
Lüftungsanlage	ventilatie-eenheid
Lüftungsanlage	VE
WLA	RVE
WLA	residentiële ventilatie-eenheid
Wohnraumlüftungsanlage	residentiële ventilatie-eenheid
Wohnraumlüftungsanlage	RVE
NWLA	NRVE
NWLA	niet-residentiële ventilatie-eenheid
Nichtwohnraumlüftungsanlage	niet-residentiële ventilatie-eenheid
Nichtwohnraumlüftungsanlage	NRVE

You can compact your glossary to 3 entries like this:

LA;Lüftungsanlage	VE;ventilatie-eenheid
WLA;Wohnraumlüftungsanlage	residentiële ventilatie-eenheid;RVE
NWLA;Nichtwohnraumlüftungsanlage	niet-residentiële ventilatie-eenheid;NRVE

Splitting up source-side and target-side alternatives

perl -CSDA -w «'EOF' - in.txt > out.txt
use strict;
while (<>) {
chomp;
my ($x, $y) = split "\t";
my @xx = split ';', $x;
my @yy = split ';', $y;
for $x (@xx) {
for $y (@yy) {
printf "%s\t%s\n", $x, $y;
}
}
}
EOF

CafeTran Help

Currently the best source of CafeTran documentation