Using source-side synonyms

Grouping Alternative Source Terms

The purpose of grouping alternative source terms is to catch as many variations of these terms in your projects as possible. Stemming can only get you so far. Stemming is logical, real language is not. Source-side synonyms allow you to also enter/catch things that might not "make sense", such as:

  • misspelled versions of source terms,
  • spelling variants (dialog/dialogue, localize/localise, gray/grey, heiss/heiß, waehrend/während etc.),
  • different-language versions of source terms (I often add French or German translations of a Dutch term to a source term entry as "synonyms". This way I will catch these too.)
  • and things like: schriftelijke waarschuwing / geschreven waarschuwing = written warning (different ways of saying things in your source language are recognised and translated the same way, for consistency reasons)

Chemical substances often have different names: different nomenclatures (systematical names?), trade names, common?? names etc. Here is a nice example from IATE, clearly showing the use that source-side synonyms can have:

ameisensaeure.png

Using these synonyms will prevent that different names for the same concept are scattered around in your glossary. Source-side alternatives can create coherence in your glossary. They also can be handy when you translate in both directions of a language pair. When you swap the Look up direction (Edit | Options | Auto-assembling | Automatic glossary look up: Change Left terms to Right terms), the source-side synonyms become alternative target terms.

Multiple source term = multiple target term

Null;Nullpunkt > zero;nil
Auto;Fahrzeug;Wagen > car;auto

Note: Carefully check whether every individual source term corresponds with each and every target term.

Inflected source terms = one (or more) target term

geschriebene;geschriebener;geschriebenen > written
heiß;heißes > hot
gato;gata > cat

Note: Here you can also use the pipe character "|" at the source side to enable stemming. Writing the forms out, however, gives you more control (e.g. to separate between heißes and heißer, that can have different translations).

When you want to use stemming, you could enter:

2.png

The recognition and auto-assembling will be exactly the same (as in the example above):

1.png

Example for English > French

A nice example can be found here:

The original poster asked: Is there a way to use more than just one source and/or target term in a single Glossary entry?
For example, let's say I have the following English and French terms:

EN: representative;elected official;district representative
FR: élu;représentant élu;député

Those are three English and three French terms which, depending on the context, could all be "synonyms".

Using stemming rather than prefix matching?

In CafeTran "prefix matching" can be used for memories. It is an automatic function that matches the beginnings of all words in a segment. Via stemming, you can also define the root of a word using pipe characters (|).

Enabling stemming

In memories stemming goes along with prefix matching and it has to be activated in the TM start-up options. In tab-delimited glossaries, it works automatically when the program comes across pipe characters in your source term.

Furthermore, stemming in memories can be applied for each word in a segment e.g. fristgerecht|e Kündigung|en, whereas in glossaries the term is treated as a whole so in your example I would put the pipe only in the
first word e.g. fristgerecht|e Kündigung to catch other forms of fristgerechte (fristgerechter, fristgerechten).

Splitting up source-side alternatives

When you want to split up source-side alternatives so that every glossary line only contains one source term at the left side of the tab, you can use this AWK command provided by Mark Jalbert:

awk -F'\t' '{split ( $1, a, ";" ); for (i=1; i<=length(a); i++) print a[i] "\t" $2}' example.txt > output-file.txt

  • Place your glossary on your Desktop.
  • Start the Terminal.
  • Go to the Desktop.
  • Execute the AWK command.

Input:

input.png

Output:

output.png

TIP: Use Excel to swap columns if you want to resolve the right side of the tab too.

Compacting your glossaries with source-side and target-side alternatives

Instead of these 12 glossary entries:

LA VE
LA ventilatie-eenheid
Lüftungsanlage ventilatie-eenheid
Lüftungsanlage VE
WLA RVE
WLA residentiële ventilatie-eenheid
Wohnraumlüftungsanlage residentiële ventilatie-eenheid
Wohnraumlüftungsanlage RVE
NWLA NRVE
NWLA niet-residentiële ventilatie-eenheid
Nichtwohnraumlüftungsanlage niet-residentiële ventilatie-eenheid
Nichtwohnraumlüftungsanlage NRVE

You can compact your glossary to 3 entries like this:

LA;Lüftungsanlage VE;ventilatie-eenheid
WLA;Wohnraumlüftungsanlage residentiële ventilatie-eenheid;RVE
NWLA;Nichtwohnraumlüftungsanlage niet-residentiële ventilatie-eenheid;NRVE

Splitting up source-side and target-side alternatives

perl -CSDA -w «'EOF' - in.txt > out.txt
use strict;
while (<>) {
chomp;
my ($x, $y) = split "\t";
my @xx = split ';', $x;
my @yy = split ';', $y;
for $x (@xx) {
for $y (@yy) {
printf "%s\t%s\n", $x, $y;
}
}
}
EOF

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License