Noise and silence
In a recent article about terminology extraction, Uwe Muegge states that “the two biggest issues with terminology extraction tools are noise (invalid term candidates) and silence (i.e., missing legitimate term candidates)”. He goes on to say that this is especially true for terminology extraction tools that use a language-independent approach to terminology extraction. These affirmations do not quite apply to Transit NXT, although it is one such language-independent tool. First, because of its way of extracting terms, it does not produce silence. And secondly, because the Extract terminology dialogue comes with two filters that allow you reduce the noise considerably. In addition to that, there is one feature that actually enables Transit to behave language-specifically. All three features improve the extraction performance remarkably. We will explain them in this blog.
Filters, and a wee bit of set theory
The filters in the Extract terminology dialogue allow you to reduce the number of term candidates to a subset that complies with the conditions you set with the filters.
Filter 1: Number of ocurrences
The first filter allows you to use the number of ocurrences of words or phrases in the text to restrict the number of suggestions on the left-hand pane.
As pivotal terms are likely to appear more than once in a publication, you might want to raise the first filter, View suggestions with at least n ocurrences, above 1, so that only words or phrases that occur more than once are suggested as term candidates. With regards to View suggestions with no more than n ocurrences, think of prepositions, articles, conjunctions, etc. They are very likely to be in plentiful supply in your text and do not constitute terminology, so you want to ignore those.
Filter 2: Number of words
The second filter enables you to control phrases and compounds. If you raise View suggestions with at least n words above 2, the left window pane will no longer list single words, but only compounds or phrases. At the same time, it is unlikely that compound terms consist of more than four to five words, so you are probably on the safe side if you limit View suggestions with no more than n words to 5.
You have to play around with these filters. Their efficacy depends on the source language and the volume of text from which you are extracting terms. The good thing is that as soon as you raise or lower the values for the filters described above, the content in the left window pane changes accordingly. You are in full control and can see whether your changes have the desired effects. Another good thing about the filters updating immediately is that you can add terms to the Specialist terminology pane in several steps using different filter settings, and only when you are happy with what you have found, carry out the import of the extracted terms into a TermStar dictionary.
Common terms list
If you read our previous post about the automatic term extraction, you will remember that I skipped a step; after you click on OK in the Extract terminology dialogue, a pop-up window appears. Transit prompts you to determine what scope you want for the common terms list.
In this step, you save the content of the left window pane, the common words, as a stop list for subsequent term extractions. In other words, the words and phrases in this list will not appear again as term suggestions in subsequent extractions. When you choose Global, all subsequent terminology extractions use this list, when you use Customer, only those extractions for the selected customer are affected, and when you use Project, the list only applies to the active project. As simple as this seems, it is incredibly powerful, because this is the moment where you actually teach Transit language and convert it from an inarticulate, language-unaware tool into your smart, language-literate helper.
I have just realised that even two posts about the new automatic terminology extraction in Transit NXT aren’t enough to talk about all there is. We haven’t touched on the context pane yet, for example, and I have not explained how you can edit terms before you transfer them to the Specialist terminology pane.
So, stick around with us and come visit again. Try out the new features and let us know what you think can be improved or added. We are eager to hear your suggestions!