Working with external databases

The Database tab
Working with external databases
Using databases for storing segments

External Databases are now called Total Recall databases, so make sure to check the articles on Total Recall too.

Database tables

CafeTran can store any linguistic data in multilingual database tables. You can define various table columns such as Context, Subject, Client and Notes in Edit > Options > Database options:

0.png

If you leave any of the fields empty, the column for the field will not be created. Make sure that ##FF5507>Source and Target language column names## match the codes of the languages in the Project and Memory.

Creating a table

  • Go to the menu External DB > New table.
1.png
  • Type the name of the table. The name must be unique and it cannot contain punctuation characters:
2.png

You can empty fields that you don't won't to use:

3.png

Next, a confirmation dialogue box is displayed, where you can switch languages. If you would like too …

4.png

Okay, the database has been created now. Let's fill it with useful content:

5.png
  • Select a TMX file that contains your source and target language:
6.png

A new dialogue box is displayed:

7.png
  • Click OK and watch the progress indicator:
8.png

After a while (depending on the size of the TMX file and the specs of your computer), you can start using the database for concordancing.

  • Select a source word and click on the DS (Database Source) icon:
9.png

Within seconds, your gigantic database is queried and all matches are displayed:

https://www.youtube.com/watch?v=uhBS5MLNIBg

The database used in the demonstration movie contains 2 million TUs from the DGT.

Opening a table

  • Go to External DB > Tables menu and choose a table.
  • A tabbed window with the table will open:
9b.png
  • You can browse through table contents by choosing Back and Forward buttons in the toolbar. Searching the table is done after pressing the Search button in the toolbar. The default search setting looks up the source language column. You can change this by selecting the Edit > Search > Database Target check box.

Adding an entry

  • Choose the External DB > Add new entry or "Add term to database" command in the Translation menu. The latter method will show the window to type in source, target text and other properties such as a note and a field.
  • Save and refresh the table after finishing adding entries.

Editing an entry

  • Select the row with the entry and type in the text areas below table rows.
  • Save and refresh the table after finishing.

Deleting an entry

  • Select the row with the entry and press External DB > Delete entry in the menu.

Deleting a table

Go to External DB > Show table info… and press the Delete button.

Storing memory in database

Both segments and terms memories can be stored in database tables. It is usually done at the end of the project after reviewing the project memory segments and completing any terms lists.

  • Select a memory tab.
  • Go to Memory > Store memory in External DB… menu.
  • Choose a database table for segments or terms storage.
  • Press OK.

Loading database to memory

Segments or terms that are stored in the database can be loaded into Memory for automatic translation of documents or sending to TMX files. To load a database table to memory:

9a.png

xx

9c.png

xx

9e.png

xx

9f.png

xx

  • Go to External DB > Load table to memory.
  • In the Database Memory window choose a table to load.
  • Set any filtering options if necessary. For example, if you want to filter segments based on the subject, go to the Memory Filter window and type in a name of the subject column in the Name field and a subject in the Value field of the Segment Properties section.
  • Press OK.
  • If you are loading table contents to Memory only for Autotranslation make sure to check in the Read Only box.

This ensures faster matching and requires much less computer RAM memory.

Use the above steps to send a database table to a TMX file. After loading finishes go to Memory > Save memory to save in the TMX file format. Of course, the Read only box should be unchecked to store data in a TMX file.

Working with references

The advanced function of a database enables setting and viewing a reference to a given unit of linguistic data. A typical reference example can be an image, a document file or a web address. Creation of a database table with a reference field is simple:

  • Go to Edit > Options > Database and type a name of the reference column in the field called "Reference column name". This field is empty by default because it is up to the user what reference type she works with. The name can be general such as Reference, Picture or Address. You can also give the name in your language.
  • Go to the menu External DB > New table.
  • Type the name of the table and press OK. The name of the table must be unique and it cannot contain punctuation characters.

The created table will contain one additional field holding a reference address and two buttons for setting and viewing the reference respectively. The "Add entry to Database" window will also contain an extra field to let you set a reference for the new entry easily.

Database backup

Like with all important data it is recommended to perform regular backups of the database files. By default, the files are located in the resources/databases folder of CafeTran's installation. You should save periodically the databases folder with its contents to your preferred backup medium.

Database architecture

PLEASE NOTE: Currently CafeTran cannot use an external database for auto-assembling. Use of databases is limited to pre-translation and concordance searches. Therefore you should use external databases to save RAM only when you work with very large collections of segments (TUs), consisting of several millions of segments.

CafeTran offers database functionality for access, storage and searching of translation segments as well as terminology. It is a perfect solution to organize your language data based on various categories.

The default database engine is an efficient H2 SQL database developed by Thomas Müller (CH). However, it is also possible to connect to other preferred databases.

CafeTran has been tested to work with other popular databases such as H2, MySQL, Oracle 10g, HSQLDB 2.0 (used in OpenOffice), MS Access, and Derby (Java DB). This gives you the unique opportunity to connect to terminology bases which are on some other machines, for instance. If you are interested in connecting your database which is not listed below, please contact the users support for more information.

When you have installed and created your database by using its administration tools, follow these steps to connect it to CafeTran:

  • Install a java driver of your database.

This is a special .jar file which should be copied to CafeTran's lib folder. The file comes along with your database or can be downloaded separately. The following .jar files are java drivers for a specific database:

MySQL: mysql-connector-java-5.1.13.jar (rename the file to mysql.jar)
HSQLBD: hsqldb.jar
Oracle 10g: ojdbc14.jar (rename the file to ojdbc.jar)
MS Access driver is installed in the Windows system along the MS Access database

As you can see some of the .jar driver files should be renamed before copying them to CafeTran's lib folder. Renaming lets you install any future versions of the drivers without updating the program.

  • Connect CafeTran to the database of your choice.
  • Go to Edit > Options > Database.
  • Choose New database… from the drop down list.
  • Fill in the following fields:
  • Driver file: leave this field empty.
  • Driver class:

MySQL database: com.mysql.jdbc.Driver
HSQLBD database: org.hsqldb.jdbcDriver
Oracle 10g database: oracle.jdbc.driver.OracleDriver
MS Access database: sun.jdbc.odbc.JdbcOdbcDriver

  • User name: type your user name here.
  • Password: type your password here.
  • Connection URL:

MySQL database:
jdbc:mysql://localhost:3306/your_database_name?useUnicode=yes&characterEncoding=UTF-8

HSQLBD database:
jdbc:hsqldb:file:./your_database_name;shutdown=true

Oracle 10g database:
jdbc:oracle:thin:@localhost:1521:xe

MS Access database:
jdbc:odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=C:/path/to/your/file.mdb

Connection URL string is in one line and it depends on your database settings. The above examples are default values.

  • Press OK and save the configuration file under some name.
  • Now select the file from the drop down list.
  • Press OK closing the Options window.

The database tables, if any present, should be visible in the External DB > Tables menu, and you are ready to work with the connected database.

The H2 Console lets you access a SQL database using a browser interface.
See here: http://www.h2database.com/html/quickstart.html#h2_console

Java DB is mainly used by Java developers to go along with their Java apps. I decided to go for H2 database as a default DB since my tests proved it a bit faster. You can still use with CT. See more info on the system installation here:
http://db.apache.org/derby/papers/DerbyTut/install_software.html#derby_install

Then, you can configure it to work with CT in the Options > DB tab >
Database connection -> New database…

Apart from language fields, CT currently supports four additional text-based fields. It is up to the user how to define those fields if any of them is needed. You just define the additional column names in the Options > Database tab as you like.

That was exactly the question I asked myself before making database
implementation :). It is said that TMX files are meant only for
exchange between users or tools. Of course, TMX files may be treated
as databases in itself and that was original approach in CafeTran.
However, this implies loading all TMX units to RAM memory in order to
do anything with them. With a typical SQL database, you don't have to
load all the units to RAM to make queries, add, change or delete an
entry. The database table-like interface is also more comfortable for
above operations.
In CafeTran, database units are loaded into RAM only to perfrom
automatic matching (for speed reasons) or to save them in a TMX file.
The loading speed is more or less the same as when you load units from
a TMX file. For example, you don't need to load them all to RAM to
change something in a target segment.
Apart from this, some users keep their terminology in SQL databases
and access them with a specific software. CafeTran has generic
database implementation which means it can also access other databases
provided they have a java driver (most do have such a driver). I
tested HSQL (the database used by OpenOffice), Derby and H2 databases,
and it works just fine.

Update

Better indexing of H2 database entries which would allow access to millions of entries without any significant delay.

New database tables do not have to be indexed - the index is auto-created and auto-updated. But if you have any existing tables made by the CT previous version, you need to index them by running External DB > Show table info… > Create index command. Indexing itself takes some time but after that the search should be lightning fast.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License