Using Spotlight to search TMs

The information in this article is valid, but the article will be completely rewritten in order to make it easier to understand. A new screencast will be recorded too.

Using CafeTran's Desktop Search Tool (DST) interface on a Mac with MDfind and Grep

NOTE: The solution provided in this article only applies to OS X.

For Windows a solution, TMLookup, is under development.

STOP PRESS:

The newest build displays the hits nicely aligned:

ghi.png

Will update this article tomorrow.

Example: Searching for bakterielle in 420 files of 2500 lines, containing the DGT DE > NL, results displayed in less than 1 second:

def.png

Convert the TMX to tab-delimited text. Keep the file size small (because of restrictions of Spotlight). (You can chop up larger text files into chunks, using split, see below.)

Insert this line in Edit > Options > Desktop Search Tool:

/bin/sh -c "mdfind -onlyin /Users/YouUserName/YourFolder/ \"{}\" | xargs grep -h —max-count=20 —extended-regexp —ignore-case '{}'"

Tick the Terminal Tool checkbox.

Thanks to Alain Côté for providing this info:

Remove the duplicates from this huge TXT file in a terminal (sort hugefilename.txt | uniq > slimfilename.txt)

Split this (smaller but still huge) file into very small ones, in a terminal.

Example: I had 1,7 million lines in my language pair, and the "split" command can create a maximum of 676 files, so I moved the file into a new directory (DGT), and from there I split it into small files of 2650 lines. Like this:

split -l2650 slimfilename.txt minifilesname-

This way you get file names from AA to ZZ (26x26).

And Jean-Christophe Helary writes:

sort -u hugefilename.txt > slimfilename.txt
should also work.

In this movie the DGT DE > NL is used, 1567699 TUs, converted to 1567699 lines in a tab-delimited file. Old movie: Watch video

New movie

Michael has sent me a tab-delimited file that contains 29,155,531 lines (= TUs from exported TMX files).

All I did was place this huge file (6.6 GB) in my DST folder, to have Spotlight index it (in very little time).

This screencast shows you just how fast the hits are found: http://youtu.be/Rffl2R6v9UE

Tip: You can use Rainbow to quickly convert folders with TMX files to tab-delimited. It is Java and cross-platform too.

Converting a TMX file to tab-delimited format

In CafeTran: Saving memory (Memory > Save as..) with a .txt extension converts any TMX to TXT tab delimited (removing duplicates in the process).

Other tools to convert TMX files to tab-delimited:

  • TMLookup, Windows-only, removing of duplicates is supported
  • Apsic Xbench, Windows-only, removing of duplicates is supported

Tip: You can also put tab-delimited glossaries that you have found on the web, in your special DST folder. By doing this you always have access to all the terms in these glossaries, without the need of loading them to RAM or running out of glossary priorities.


- To verify the number of lines in the tmx file : wc -l filename
- Then you cut it into x files of y lines with the split command.
- You add the txt extension to all your resulting tiny files with this command: for f in * ; do mv "$f" "$f.txt"; done

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License