Using public corpora in Transit NXT

Welcome to a new tooltip about Transit NXT. Today we would like to mention a very useful resource. The translation workload of some big international institutions generates a lot of reference material, which can be used by anyone if they release it in a suitable form, for example the translation memory exchange format (TMX). This is the case of the United Nations’ and the European Commission’s multilingual document collections (Uncorpora and DGT, respectively).

You can find other similar resources online, yet not so massive, released to the public domain either by large institutions, such as the European Medicines Agency or the European Central Bank, as well as several other European institutions, or even by communities and groups of volunteer translators who localize free and open source software into the community’s local language and then release the translation memory back to the community. The OPUS corpus is an initiative to centralize this kind of public resources.

These vast resources can be downloaded and converted into Transit NXT’s language pair format by means of the Import TMX functionality that you will find in the resources bar (button Reference material > TMX interface), as we saw in the tooltip How to use a translation memory from another tool.

Converting TMX to a Transit NXT language pair

Converting TMX to a Transit NXT language pair

Once you have done that, you will be able to add the collection of language pairs to any project and hence potentially obtain concordance and fuzzy matches as long as you translate in one of the language combinations contained therein, of course.

And that’s all for now. Thanks for reading, and please do not hesitate to send your comments or questions or to ask for specific tooltips.

Advertisements

About Manuel Souto Pico

Linguist and translation technologist. Google profile.
This entry was posted in intermediate level, project management, reference material, Transit NXT, translation and tagged , , , . Bookmark the permalink.

2 Responses to Using public corpora in Transit NXT

  1. Patricia says:

    Thank you, Manuel. Very interesting!

  2. Pingback: Multi-directional translation memory | Transit/TermStar NXT Tooltips

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s