QC in the age of MT
Attack of the MT
These days everyone is getting increasingly anxious about sinister artificial intelligence technologies that eventually will take all our jobs. For sure, truck drivers and translators are among the most endangered professions. Someone would say that this is an exaggeration, but I think that language industry professionals will agree with me. And truck drivers too. Modern Machine Translation (MT) solutions are capable of processing tons of text in no time, and in most cases, they work for peanuts. Moreover, these days Machine Translation does a decent job. Nevertheless, there is a good news too: this is not the reason for translation companies to ditch their professional staff and to buy a bunch of servers instead.
Just like humans do
It is hard to admit, but modern robots can make mistakes. We invent them to be our steel and unmistakable helpers, but we miscalculated by a large margin. Neural networks mimic the human brain and as a result, they can produce unpredictable results. Just like humans. And even the most rigorous learning process is not a guarantee that AI will generate faultless translations. A neural network is a black box. The logic and mathematics of a neural network are not fully clear. It is hard to predict the response of a neural network to new and unknown input. "With machine learning, the engineer never knows precisely how the computer accomplishes its tasks" (Wired). This fact leads to debates on who takes responsibility for accidents involving self-driving cars. Software engineers cannot be responsible for the decisions made by continuously evolving neural network.
Online retail is one of the industries where translation's role can be vital. For example, eBay translation specialists offer the following classification of errors made by Machine Translation software (Source: eBay):
- Errors with economic consequences
- Economic consequences come from errors that prevent a customer from doing business with the company. Customers of eBay start buying by entering a query to search for the items that they want to buy. So it is critical that the translation of the query is appropriate to find the best results possible. When a query is translated in a way that does not bring results, this becomes a severe error because the customer is not buying from that translation.
- Errors due to offensive or inappropriate words
- ...the translation for “female jacket” should sound like “feminine jacket” or “jacket for women”. If you are referring to an animal, the word is more on the anatomic side, expressing the idea of something being physically female.
- Errors with legal or safety consequences
- Errors of legal or safety nature could come from converting units of measurement from one system (English units) to another (metric). This kind of issue is critical in medical translations, where the dosage of a medication in the English system turned into a metric unit, keeping the same number, could risk a person’s life!
Humans Strike Back
Some of the articles and books on machine translation technologies date back to 1960s, for example, "Language and Machines: Computers in Translation and Linguistics", The National Academies Press. (1966) or "The Present Status of Automatic Translation of Languages", Advances in Computers, vol. 1 (1960). More than 50 years have passed but Machine Translation technology is far from being perfect.
Nevertheless, automation is unstoppable: all boring jobs have to be automated to let humans do more sophisticated and creative tasks. But these days, we humans have a very important work on our hands: we have to teach machines and evaluate their performance. With the wide adoption of machine translation technologies, translation Quality Control becomes more important than ever. MT software is still not mature: these systems are cleverer than before, but they still need assistance, guidance, tutoring and rigorous control. Because there is only one area in which modern MT technologies are on par with humans: they are capable of making really weird and stupid mistakes. Exactly like humans do.
Basically, the problem is that machine translation software does not understand the meaning of the text. Usually, the neural network is trained on the host of samples. Then it uses these data to literally ape the teacher - human translator as a child imitates what adults have said. As a result, MT software can only be as good as their teachers. Translation industry eagerly needs highly qualified "teachers" to create and revise sample "learning" materials for neural networks and for other types of MT engines.
And this is only the beginning. The translated text must be proofread by professional and attentive editors. There are so many aspects that should be checked: accuracy, grammar, terminology, formatting and many other things. Here I pay special attention to the very important but often overlooked errors with economic, legal and safety consequences.
Top 7 Machine Translation Errors Types
Translation of the text is a quite sophisticated task. Often, translator obliged to deal with a host of requirements and follow a multitude of rules: from a number of characters per line in case of software interface translation to the dates and numbers formatting in engineering and financial documents. These requirements can vary widely from one project to another. The checklists and guidelines are changed frequently. At times they are so vague and murky that even humans have hard times trying to decipher their meaning.
All these tasks are usually left for post-editing phase. And while linguists are trying to catch grammar and terminology errors, our QC teams are focusing on the "technical side": formatting, measurement units’ conversion, consistency. Below are the most frequent and common types of errors found after (!) translation, editing and proof-reading phases.
It is quite surprising but we regularly encounter large fragments of untranslated text in final files. The reason is purely technical. In some cases, MT and Translation Memory software are not able to adequately process particular formatting elements like text in schemes and diagrams, footers and text boxes. Also some type of documents (for instance, engineering or medical) could contain "text as image" elements. Without using OCR this text is totally unseen for translation software.
It seems that it is difficult to make such an obvious mistake, but translators, editors, and proofreaders usually use specialized software which hides distracting formatting. They do not review the layout of the translated documents which is generated automatically by the translation software. That is why many language services vendors introduce Quality Control specialists to their teams. QC editor is a "middle-man" who establishes a connection between linguists, DTP professionals, and Project Managers.
One should be extremely careful when formatting numbers. This is not that insignificant as it seems at the first sight. For example, in American English "10.395" means ten and three hundred ninety-five thousandths, but in Latin American Spanish this number reads as ten thousand and three hundred ninety-five. Sometimes numbers are spelled out and vice versa. In such cases, our QC editor inquires linguists and translators to check if numbers are spelled out correctly. All these errors cannot be revealed by automatic QA tools because you should understand the context to catch them.
To Translate Or Not To Translate
Sometimes they do not translate what has to be translated
Sometimes they do translate what does not have to be translated
Translators are like that.
These days many documents contain numerous technical features which do not have to be translated: meta tags, markup tags, software menus, URLs, references. In the most difficult cases, only specific parts of these snippets have to be translated. Often there are quite detailed guidelines which are difficult to implement into MT and TM software. MT engines are barely able to process correctly meta tags like [INFO DATA=“new value”] where only "new value" has to be translated.
Text that should be replaced
In individual cases, parts of the document should not be translated but must be localized. A translator or a DTP professional has to replace mail addresses, phone numbers and other country-specific information with the localized contact templates. It is possible to "teach" TM and MT software tools to follow the rules, nevertheless incorrectly translated addresses and local contacts are among the widespread errors caught by our QC editors.
Conversion of measurement units
Conversion of measurement units also is pretty difficult to fully automate. The rules for measurement conversion are defined by the client preferences, industry standards, and regulations. Generally, measurement units have to be converted (for example, imperial into metric), but sometimes they should not. Often converted values should be enclosed in brackets and placed after the initial values. For instance, display sizes are usually measured in inches and do not have to be converted into cm or mm. Another good example is pipe sizes. NPS or “Nominal Pipe Size” values should never be converted from inches to metric because a translator ought to use a metric equivalent parameter which is called DN or “diametre nominel”.
The date formats are different for different languages. For example, 3/5 is March, 5th in the U.S. English and May, 3rd in British English. A simple error which is difficult to catch without seeing the source file.
Translation software is totally helpless when there are errors and inconsistencies in the source documents or the source files are multilingual (including fragments in two or more languages). TM and MT tools are not able to flag even the most obvious errors and inconsistencies in the source documents and carefully reproduce them in a target translation.
This is only the brief list of possible errors which MT, TM and Automatic QA software cannot catch and resolve. What is the solution? Humans are the answer. We still need human editors who will thoroughly and intelligently check the translation against source materials. An experienced QC editor is able to identify even language errors and source inconsistencies and provide MT development team with vitally important feedback.