Compacting reference material

Language pairs created during translation in Transit NXT can be used as reference material for future projects. It is a good idea to store the language pairs from previous projects in a single centralized folder, so that you just add that folder as reference material in new projects. Depending on your workflow and volume, this reference material will grow with time, perhaps too much, but there’s something we can do to reduce its size.

One segment might appear several times across the reference material, meaning that several segments in the reference language pairs will be identical, but for segment-based reference you do not really need more than one occurrence of each segment. Transit NXT can eliminate all duplicates so that there is only a single copy of segments which occur multiple times or only differ slightly, and this is what we call compacting. The compacted language pairs will contain a maximum of 15,000 segments, so compacted files will never be too big.

The advantage is that you will have smaller reference language pairs, which will be faster to transfer and might return faster pretranslations and fuzzy matches, as well as a lighter reference material folder/batch. However, there is a relative disadvantage. Transit NXT stands out of the crowd of CAT tools in that its “translation memory” is document-based, rather than stored in a database, and this means that you can always check the whole context (i.e. the full document) of a reference segment… unless it has been compacted.

Unlike the original language pairs, the files in the compacted reference material will not correspond to any real document that you have translated; instead they will contain segments from different files. That means you will not be able to open a reference file in Transit NXT to check the context as you would with a non-compacted language pair. However, if you never do that (open the document the reference segment comes from to check the full context), you might not even notice this disadvantage.

So, those are the things you need to consider when deciding whether you need to compact your reference material. Is your reference material so voluminous that fuzzy match lookup and pretranslation take too long? Do you frequently check the reference document in its entirety?

If you decide to go ahead, here is how. Select Reference material | Compact reference material from the resource bar.

Launch the 'compact reference material' dialog

Launch the ‘compact reference material’ dialog

The Compact reference material window will be displayed. There you must specify the folder that contains the language pairs that you want to compact (or the project, or individual files, but I recommend using a folder) and the path (including the file name) to the files that will be generated. If you know it, select the source language; if you don’t, Transit NXT will do it for you.

Paths to reference material and to where compacted files must be put

Paths to reference material and to where compacted files must be put

The compacted reference files will be put in the target folder:

Generated compacted files

From now on, add this folder only (MyCompactRefMat in my example) as reference material for the project in the project settings.

Adding the compacted files as reference material to the project

Adding the compacted files as reference material to the project

Bear in mind that you should carry out this process periodically: as new projects are created, using compacted reference material requires a bit of extra management to keep things organized. For example, you would need a folder to store language pairs which have already been compacted, a folder to store the compacted language pairs, and yet another folder to store language pairs that have not yet been compacted (language pairs from projects created after the last compacting).

Thanks for reading, and please do not hesitate to send your comments or questions or to ask for specific tooltips that address your specific doubts.

Special thanks to Karen Ellis for reviewing this post.


About Manuel Souto Pico

Linguist and translation technologist. Google profile.
This entry was posted in intermediate level, project management, reference material, Transit NXT, translation and tagged , , . Bookmark the permalink.

One Response to Compacting reference material

  1. Although you loose the document context when you compact reference material in Transit, you can in fact have Transit save some context using the option “Also save context information”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s