The sections below give objective criteria for evaluating the usability of machine translation software output.
See main article: Round-trip translation. Do repeated translations converge on a single expression in both languages? I.e. does the translation method show stationarity or produce a canonical form? Does the translation become stationary without losing the original meaning? This metric has been criticized as not being well correlated with BLEU (BiLingual Evaluation Understudy) scores.[1]
Is the system adaptive to colloquialism, argot or slang? The French language has many rules for creating words in the speech and writing of popular culture. Two such rules are: (a) The reverse spelling of words such as femme to meuf. (This is called verlan.) (b) The attachment of the suffix -ard to a noun or verb to form a proper noun. For example, the noun faluche means "student hat". The word faluchard formed from faluche colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here:
Il y a une chorale falucharde mercredi, venez nombreux, les faluchards chantent des paillardes!> There is a choral society falucharde Wednesday, come many, the faluchards sing loose-living women!
French argot has three levels of usage:[2]
The United States National Institute of Standards and Technology conducts annual evaluations https://www.nist.gov/speech/tests/mt/ of machine translation systems based on the BLEU-4 criterion https://www.nist.gov/speech/tests/mt/doc/mt06_evalplan.v4.pdf. A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemented by Gimenez and Amigo http://www.lsi.upc.edu/~nlp/IQMT/IQMT.v1.0.pdf.
Is the output grammatical or well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the following Arabic language input and English language translation result from the Google translator as of 27 December 2006 http://www.google.com/language_tools?hl=en. This Google translator output doesn't parse using a reasonable English grammar:
وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم".Semantics preservation
Do repeated re-translations preserve the semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006:
Better a day earlier than a day late.> Améliorer un jour plus tôt qu'un jour tard.
> To improve one day earlier than a day late.
> Pour améliorer un jour plus tôt qu'un jour tard.
> To improve one day earlier than a day late.
As noted above and in, this kind of round-trip translation is a very unreliable method of evaluation.
Trustworthiness and security
An interesting peculiarity of Google Translate as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded joke in the English-Spanish dictionary which has some added poignancy given recent events:
Heath Ledger is deadThis raises the issue of trustworthiness when relying on a machine translation system embedded in a Life-critical system in which the translation system has input to a Safety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers.> Tom Cruise está muerto
It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as statistical machine translation. Reporters from CNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate".[3] The mistranslation was the subject of much hilarity and speculation on the Internet.[4] [5]
If it is an unintended consequence of the use of a method such as statistical machine translation, and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method.
In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages.
This leads to the issue of whether a particular translation could be considered verifiable. In this case, a converging round-trip translation would be a kind of verification.
See also
- Comparison of machine translation applications
- Evaluation of machine translation
- Round-trip translation
- Translation
References
- Gimenez, Jesus and Enrique Amigo. (2005) IQmt: A framework for machine translation evaluation.
- NIST. Annual machine translation system evaluations and evaluation plan.
- Papineni, Kishore, Salim Roukos, Todd Ward and Wei-Jing Zhu. (2002) BLEU: A Method for automatic evaluation of machine translation. Proc. 40th Annual Meeting of the ACL, July, 2002, pp. 311–318.
Notes and References
- Somers. Harold. Round-trip translation: What is it good for?. Proceedings of Australasian Language Technology Workshop ALTW 2005. 2005. 127–133. Sydney.
- http://chitlinsandcamembert.blogspot.com/2005/10/agony-of-argot.html "The Agony of Argot", Chitlins & Camembert, October 28, 2005
- http://www.news.com/8301-13577_3-9857280-36.html?tag=newsmap "Google Translate bug mixes up Heath Ledger, Tom Cruise", by Caroline McCarthy
- http://gawker.com/5002510/tom-cruise-is-spanish-for-heath-ledger '"Tom Cruise" is Spanish for "Heath Ledger"', gawker.com, January 24, 2008
- http://rayhey2.blogspot.com/2008/01/tom-cruise-est-muerto.html "Tom Cruise está muerto", Ray Leon Blog Project, January 24, 2008