As a kid, I was fascinated with online machine translation. As an English teacher in 2019, it was my biggest pet peeve.
Some examples in this blog might be vulgar or offensive to some audiences.
I used to believe computer translators would destroy human translator jobs. Then, some of the strangest things I heard while teaching English as a foreign language were the faults of computer translation apps. Still, we see advertisements for gadgets that claim to translate speech in real time, and one study claims that Google Translate could pass UCLA’s English Proficiency Exam. So, what is the current state of Machine Translation (MT), and what will be the future?
Neural Machine Translation is rapidly gaining popularity, and it is now Google Translate’s preferred method. It combines the previously popular method of looking at professional translations (known as Statistical Machine Translation) with machine learning. To be successful, this requires large amounts of professional translation data, so it is much more accurate between Western European languages than outside of them.
A close look at any online translation service shows us that they are very helpful for simple translations but quickly deteriorate as input becomes long or complicated. It’s common for computer translators to analyze large collections of writing to create translations, and English has the largest amount of available data, so translations are more accurate to English than from English. Additionally, a recent competition between Google Translate and professional human translators in South Korea reported that 90% of the translations given by computers didn’t sound native, and especially not educated and native.
There are many other problems MT still has to overcome
- Differences in word order
- Differences in required information
- Homophones — their, they’re, or there?
- Homographs — ‘bass’ as in fish or music?
- Morphology — be, am, are, is, were, was, been
- Irregularities — is “We hit it hard” present or past tense?
- Differences in linguistic customs and preferences — like passive voice
- Garden Path Sentences — “Time flies like an arrow; Fruit flies like fruit”
Even using spell and grammar check in one language can be frustrating at times. Perhaps MT will not compare to human translators until computers are as smart as humans, achieving Natural-Language Understanding.
Aside from the aforementioned shortcomings, linguists are concerned with the social and cultural implications. First, how appropriate is the language for the situation? Real linguistic competence requires cultural knowledge and situational awareness, which further confines appropriate language, even inside of linguistic borders. “Coger” means to get, catch, or take in most countries outside of Mexico, where it is a vulgar word for intercourse. American fans of Drag Race don’t see a problem saying the name Ginger Minj because they don’t know it is a vulgar word for female genitalia in the UK. In short, MT won’t help us be very social.
There are large differences between academic or formal registers and the everyday, colloquial speech that native speakers use. Currently in French, many speakers are dropping the word ‘ne’ in negative sentences, but it is required in French writing. Similarly, in English, it can be hard to hear the difference between “I can” and “I can’t” in speech.
Next, how do we even define language? Do we accept the spoken norms or only the written standards? Take for example the sentence “He eat more pizza than me.” Many native English speakers would say we need to add an S to “eat,” and not realize that “me” is also academically incorrect. Both uses are recognized as standard in multiple American English dialects, however, “than I” sounds distinctly unnatural to most English speakers, though academia requires it in this sentence.
Geographical and social customs further complicate the discussion. In New York City, “the city” refers to “lower Manhattan,” even when the speakers and listeners live and work inside the New York City limits. Philosophically, there are no easy definitions of language nor which dialects should be considered for MT.
Companies and coders also have to tackle the issue of language change. It’s easy to assume that globalization will diminish the amount local dialects, but some analyses of American English show that dialects are not converging, and some suggest that dialects are actually diverging more quickly. Most linguists agree that languages are changing more quickly due to the rise of social media. Even if we disregard these changes slang and casual, young people now days can easily speak English that their English-speaking grandparents don’t understand.
Then if we decide to support everyday speech and/or local dialects, we return to the aforementioned problem of defining language.
MTs can only pull information from the internet — so how can they distinguish between intentionally funny phrases, mistakes, and true representations of colloquial or changing language? One way or another, the code is forced to participate in social and political commentary.
Fortunately, companies have been humble enough to admit their shortcomings. Google has sought human translators to help improve Google Translate. Facebook has added functions to rate and improve translations. The middle ground between using computers and using humans is recognizing that MTs are helpful and useful to professional human translators. For now, combining Neural Machine Translation with humans is standard.
As an extension of MT and computer linguistic competence, an app called Busuu plans to let people have conversations with Amazon’s Alexa to learn foreign languages in the future. Still, the CEO has admitted “You won’t become fluent in Spanish by using the Alexa bot as of today.” (“Of” in that quote was automatically flagged by grammar check.)
What does MT mean for human languages?
Google Translate currently supports 103 languages, allowing access to over 80% of the world’s population, but it is missing support for at least 98.3% of the world’s six to seven thousand languages. On average, one language dies every 2 weeks, and technological choices can help preserve those cultures and their knowledge or condemn them. Minority languages need to be protected, but it’s not even tiny minorities that have been ignored. Google Translate and most other MTs don’t support Cantonese nor Hokkien (over 100 million speakers combined), the two Chinese languages furthest from standard Mandarin. Cantonese is the dominant language of American Chinatowns and Hokkien is also common, so MT support would be helpful.
It appears that there are a lot of technological and linguistic problems MT and artificial intelligence still need to overcome to provide human-like translations. These problems are not confined to this topic. Coders should consider how
and other factors influence and are influenced by their code, especially for projects as large as MT. Additionally, coders should remain humble and recognize the limitations of technology.
Lastly, please use human translators to avoid making the same mistakes seen in the pictures in this blog.