Terminology

Introduction: CafeTran's handling of terminology

Conceptually, CafeTran distinguishes between three type of text elements: terms, fragments (subsegments) and segments. You could compare these units with the well-known division from chemistry, between atoms, molecules and compounds.

Terms can consist of one or more words, they are like the atoms in nature: car, car phone, company car parking space
Fragments (also known as subsegments) can consist of one or more terms (and thus one or more words): travel by car

The boundaries between these two concepts aren't rigid. However, if you think that a word (several words) constitute a term, you should store it in a CafeTran Glossary.

Glossaries can be structured very simple with two columns, like the vocabulary lists on a folded paper that you used in school, or complex looking table-like termbases with lots of extra info (client, subject, alternatives etc.).

Fragments should be saved in so-called Fragment Memories, which are TMX files. You can allow fuzzy matching of the fragments in such a memory, that is: the content units will be recognised in your source segments (in your source text) even when they aren't present in exactly the same form. Similar forms, with different ending, spelling etc. will be recognised too.

Terms and fragments together form segments (like the compounds in chemistry): larger chunks of information that can occur on their own (but in the context of the text), e.g. as headers, titles, table cells, list items or between an uppercase letter and a full stop (also known as a sentence).

Note that a segment can consist of one or more words, one or more terms and even of one or more fragments. Segments are stored in a Translation Memory (Segments Memory would be a good name too), which is also a TMX file. But this TMX file is used for matching at the segment level, whereas the Fragments Memory is used for matching at the fragment level.


There are two ways to store terms in CafeTran

  • In Termbases which are TMX files

Use them if you want to have fuzzy term recognition

  • In Glossaries which are plain-text files with TABS

Use them if you want to have compact files that are easy to manage

Watch the screencast

CafeTran offers two different ways for storing term pairs:

  1. Tab-delimited text glossaries
  2. TMX files for terms

We strongly encourage you to use tab-delimited text glossaries for storing your terminology, since they have many advantages compared with TMX files. They can be sorted and edited very easily, using a text editor, a spreadsheet program or an advanced editor for tab-delimited files (Windows: Ron's Editor, OS X: Xtabulator, nView, Java: JavaCSVeditor).

Creating a glossary

When you want to create a glossary (or Memory for Terms) you can add terms on the fly (as you go along with your translation project) or you can use CafeTran's feature to collect all frequent words in your translation project. You can even let CafeTran ignore words that are in a stop list (a list with stop words) or at the source side of an existing glossary (or Memory for Terms). In order to see frequent source terms in their context, you can right-click on them in the tabbed pane and filter on them. They are then shown and highlighted in the grid.

See: http://cafetran.wikidot.com/extracting-frequent-words

What to add to your glossary?

You should add any term that you expect to appear again in your current or in future translation projects. Think of terms as aids both for your memory and your (typing) hands: what is in your glossary, will pop up automatically and you do not have to retype it anymore.

When you start with CafeTran and have no glossaries, you should enter several terms from each and every new segment. At this stage, you do not have to make any decisions about how you want to add your terms (case-sensitive or not, with pipes or not, with source-side or target-side alternatives or not, …). Just select them and add them.

Continue adding new term pairs for every segment until you translated about 30 % of your project. By now you should have a starting glossary and you should see most of the source terms in new segments being recognised. Continue with adding term pairs that aren't recognised and that are likely to occur again.

In the beginning you will spend much extra time in adding words to your glossary (even when this is as easy as it possibly can be, in CafeTran), but CafeTran will reward you later on.

You shouldn't restrict your glossaries to dictionary-style terms only. You should add articles, adjectives, prepositions and nouns to your terms. The more you add, the less you will have to type in future. And your consistency will gain.

So, do not only add:
Source Target
device Gerät

But also:

Source Target

Please continu here: Glossaries
Or here: TMX files for terms

(testing something)

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License