What do neural nets, artificial intelligence, and deep learning have in common?
Two main things:
- They’re the underpinnings of contemporary machine (or “automatic”) translation, and…
- They sound like the names of sci-fi movie antagonists. (Read: they scare people.)
Mention machine translation in conversation, and you’re likely to elicit some form of cringe at the imminent possibility of the machine apocalypse.
Is there actually anything to be afraid of? Are our brains destined to endure eternal inferiority to constantly-improving digital neural nets? Are you going to lose your job to a multilingual machine?
The answer to the first question is certainly no, and the second two, almost certainly no. Fears of machine translation are most often founded in misconceptions about what it is and how it works.
Let’s not kid ourselves: machine translation is a powerful tool that has already permitted millions of businesses to reach foreign online markets, and perhaps billions of individuals to communicate with others from all four corners of the globe.
Nevertheless, it’s an area that’s ripe with myths and misunderstandings, due to its highly technical nature—and somewhat disputed reputation.
Imagine you’re puttering around on Google Translate. You’ve probably already asked yourself…
- …how do Google Translate and other machine translators actually work?
- Are there any viable alternatives to Google Translate today? (And, while we’re at it, any…less-viable ones, that you might do better to avoid?)
- Have the robots won? Are machine translators going to replace human ones?
- …And, whether or not they’re on their way to take over the world, what’s the future of machine translation tools?
Read on for answers—some of them may surprise you.
How does machine translation actually work?
Part of machine translation’s (MT’s) controversial rep comes from the persistent misconception that automatic translators work word-by-word, or even sentence-by-sentence.
This was true of early, incredibly basic translation technologies (remember Yahoo’s Babel Fish?), which generally used rules-based machine translation (RBMT) systems.
The first major advances in automatic translation took the form of RBMT around the beginning of the 1970s; one of the major early actors in the domain was PROMT, the Russian translation software company that now operates Reverso—a pretty popular product in France and the U.S. (which, for the record, has long moved past RBMT technology—they’re keeping up with the times and using more statistics-based and neural translation systems now, which we’ll talk about further on).
Rules-Based Machine Translation: The Grandaddy of MT Tech
RBMT has three major forms: direct, transfer-based, and interlingua translation.
While each method works a little bit differently—explained in pretty finite detail in this FreeCodeCamp article on the history of MT—the logic is similar: define a set of grammatical rules and a dictionary for each language pair, then apply these rules and direct word-to-word dictionary translations universally, regardless of the source text’s structure or context.
As you can probably imagine, this kind of MT was never particularly reliable. The example above shows that even when some basic grammatical rules are applied—for example, the recognition that “the” in English may become one of several articles in French, depending on the gender & number of the object in question—the result of RBMT generally isn’t very accurate.
In just this example, the machine would not have picked up on the fact that in this context, the correct French verb to use for “know” would be “connaître”—not “savoir”; plus, “ne” and “pas” should go on either side of the verb, not both in front of it.
The output, “Je ne pas sais la réponse,” doesn’t actually mean anything in French. Yikes.
Even interlingua, supposedly the most complex form of RBMT—where the source text is “passed through” a made-up “universal language,” and the “universal language” text subsequently translated into the destination language—can’t account for most of the nuances we use in daily speech and writing.
The good news? No one uses RBMT anymore.
A lot has changed since the days of Yahoo’s Babel Fish. Even Google Translate has used, since its very beginnings, what is called a statistical machine translation (SMT) model.
Statistical Machine Translation: Crawl, compare, correct
A statistical machine translation system breaks down input sentences into bits. These “bits” may be words, phrases, or syntactical arrangements.
It then “crawls,” or uses as data, giant libraries of translated texts—called a “corpus”—and finds all of the examples of the “bits” (words, phrases, grammatical structures, etc) that the library texts contain.
FreeCodeCamp’s awesome history of machine translation explains in much greater detail the nuances of word-based, phrase-based, and syntax-based SMT, but the process generally looks something like what’s sampled in the next diagram.
The SMT translator finds matches of each bit of the input sentence—that is, the sentence you enter in the original language—in the text corpus, then assembles all the corpus’s various translations of the bit, and, using prediction algorithms, narrows down which one appears the most frequently or seems the most adapted to the context.
Since SMT gets its translations from real-life examples, that were originally translated by humans (that is, the texts in the corpus), the results read more naturally than those regurgitated by a RBMT system.
But even SMT has started to become old news as its successor, neural machine translation (NMT), has taken to the forefront as an incoming technological standard.
Neural Machine Translation: Simulating the human brain
Without going into too many dirty details, NMT essentially takes SMT to the next level, and uses algorithms to “teach itself” how to recognize the most natural possible word and phrase combinations for each language pair that it’s fed.
The name “neural network” didn’t just appear from the ether: NMT systems are built to replicate human brains—and the neurons that constitute them. They are literally programmed to constantly correct themselves, and to improve based on human-fed examples.
So with a well-oiled neural-net translation machine, your translation result is incredibly likely to resemble, almost to the letter, what a real speaker of the target language would say, since it builds off of things that real speakers already have said/written.
Now that we’ve looked at a basic typology of machine translation tools, you might be wondering…
What are the major MT tools on the market today? How are they different from one another?
- Google Translate: When you think “automatic translation,” Google Translate is probably the first thing that pops into your head. While it hasn’t necessarily been on the market the longest out of all its MT competitors, Google Translate remains a force to be reckoned with.
You may remember that “La Bamba” video that Google circulated in 2015 to promote its new visual translation feature—which is, by all accounts, pretty incredible. Google Translate may sometimes flub on longer texts, but it’s the product of constantly-improving machine learning—and human collaboration, which enriches its SMT database with ever-more-reliable, up-to-date translations.
- Bing Translate/Microsoft: Like Google Translate, Bing Translate relies primarily on SMT methods; also like Google Translate, its developers are starting to pivot towards a neural network-based system.
- IBM: Alongside Google, IBM has turned out to be one of the frontrunners in the machine learning space-race. IBM’s artificial intelligence suite, Watson, includes a Language Translator tool that relies on neural net technology to deliver constantly-improving translations.
While Watson Language Translator can be integrated into a website, it doesn’t offer any functionality for managing your translations after you’ve executed them—so you’re stuck with what the machine delivers.
- PROMT: Okay, when we said before that “no one” still uses RBMT…that wasn’t necessarily true. PROMT does, at least on its primary site, www.online-translator.com.
But the Russia-based company and research facility also launched the Reverso tool, back in 1997, along with a team of French developers. Reverso has since evolved into a higher-performing neural network-based dictionary and translator.
- Yandex Translate: Another product of the Russian tech ecosystem, Yandex Translate originally used PROMT as its backend tool. Since its creation in 2009, however, Yandex Translate has adapted to the times:
- Linguee: A bit like Reverso, Linguee takes the form of a context-based dictionary. It’s not a “translator,” so to speak, as it isn’t intended to provide immediate translations for long texts; rather, it provides a range of possible word- or phrase-translations based on bilingual text corpuses.
Since Linguee technically isn’t a machine translation tool, it may not appear to belong on this list. But its founders, Gereon Frahling and Leonard Fink, have more recently launched DeepL, a true translator—which we’ll talk about now.
- DeepL: Frahling and Fink showed some precociousness in how they used existing texts—assembled after crawling literary, academic, judiciary and research databases of bilingual (human-translated) texts—to demonstrate translations in context with Linguee. They applied this methodology to their more recent product, DeepL—which has become, in a sense, a Linguee for translations that go beyond the single-word/phrase category.
Bonus: which one does the best job?
Two of the biggest advances in machine translation, as we saw earlier, were the introduction of statistical analysis based on large bilingual text corpuses, and the incorporation of neural network technology into translators.
As of right now, DeepL stands out from its competitors in that it feeds its machine-learning technology with one of the biggest text bases available in the translation world (thanks to its sister product, Linguee), and has a dev team entirely dedicated to improving its neural network.
Google’s parent company, Alphabet, is also investing quite a bit in its machine learning capabilities, and has access to a text corpus that is at least comparable than DeepL’s. Plus, Google Translate is available for far more language pairs than DeepL—for now.
Even so, a lot of professional translators and researchers in machine translation are putting their money on DeepL—and they’ve got some pretty good reasons to.
Weglot is among the professionals taking advantage of the most groundbreaking tech on the market: our job is to translate your website, using MT on the first go, and allow you to update and refine your translated site versions.
We use a variety of services—including DeepL, Google Translate, Microsoft/Bing Translate, and Yandex—but we constantly performance-test them (alongside other platforms, too), in all language pairs, to make sure that we’re delivering the most up-to-date, natural translations to our users.
Since all of our machine translation engines are neural-network based, you know the results are going to be high-quality—and better yet, they’re continually improving, given the self-teaching nature of deep learning tech.
Is MT going to replace human translators?
The short answer is, no.
The long answer is…kind of. But not completely.
The wonderful TextMaster article (linked above) explains why certain translation jobs will just never be performed as well by machines as by humans.
One example cited in that article, and often lamented about on the Web, is that of idioms: when you’re asked to “Say cheese!” in English, it takes a certain cultural awareness to understand that you’re supposed to smile for the camera, and just as much awareness to know that it definitely doesn’t translate to “Dites fromage !” in French.
There may come a day where machine translators can pick up on this particular idiom, and many others; but the variety of such expressions throughout the world’s 6,000-plus languages makes it unlikely that every single figure of speech will ever be machine-translatable into every single other language out there.
The most likely scenario, as envisioned by the pros at TextMaster, is that machine translation will make the human’s job infinitely easier—as machines generally do.
A really big translation job—say, an entire corporate website with tens or even hundreds of pages—is probably best carried out, at least on the first go, by a machine.
Afterwards, having a native-speaker eye look over the details is never a poor idea; but the current state of machine translation, especially with tools like DeepL and the ever-evolving Google Translate on the market, makes it possible to save time and human resources on such massive jobs without sacrificing quality.
The combination of human and machine linguistic powers is exactly what makes multilingual software like Weglot so appealing in today’s economy, where customers and clients expect light-speed service without losing out on quality.
And quality is evidently of first-order importance when it comes to translations: certain nuances of human speech and subtext are simply better left between human hands—from tone, to formality, to brand-specific language. For elements like these, it may be a good option to have a real native speaker look over a machine-translated text.
What will MT look like in the future?
Even if you’re not particularly involved in the online-translation world, you’re still likely to benefit from its impending evolution. MT has some pretty cool horizons that may make multilingualism more accessible to just about anyone.
Some of MT’s horizons are even wearable.
Wearable tech has long been a tricky market to master—techwear products have to contend with the physical perils of being manhandled, the pressure to perform at light speed, and the sartorial tastes of its users.
Take Google Glass, for example: it was, from a product design standpoint, not a terrible product. It looked sleek (no chunkier than the first Apple Watches, for starters), performed fairly well, and had the pretty enormous advantage of the world’s third-most valuable brand backing it.
So why didn’t it take off as a consumer product ? In the end, it comes down to the good old question of supply and demand: Google simply failed to articulate what problem Glass could solve for its users.
This is why projects such as Brooklyn-based Waverly Labs’ Pilot earpiece, and other wearable machine translation tech, might actually gain traction. Getting lost in translation is a real, relatable, and really aggravating problem—and it affects more people today than ever, as international travel and communication become the norm.
A few other material innovations have left their mark on the machine translation world recently—and, unsurprisingly, a bunch of them come from Google.
Their Tap to Translate function, released for all apps on all Android phones in 2016, gives mobile users the on-the-spot text translation capabilities that sites like Facebook have long since incorporated into their interfaces, making international communication (which, obviously, is pretty common online) ever easier.
And finally, we already mentioned Google’s visual translate feature, but it’s worth mentioning again: while far from perfect, it’s a powerful tool, and is essentially the first of its kind on the mass market—at least, the first that’s available for free.
So, is all MT going in the way of Pilot and Google? Will we, in 2025, all be wearing electronic Babel Fishes in our ears? It’s hard to know for sure. Suffice it to say that machine translation isn’t what it used to be: it’s a lot more accurate, and getting better every day.
Now that we’ve answered some of your questions about how it’s done, you may still be wondering: is machine-translating your website a reliable way to go multilingual?
Let’s be clear: machine translation isn’t perfect—and, in the end, neither are humans.
But, just as human scientists make advances in their fields every day; human linguists perfect their craft continually; and human translators expand their own vocabularies and literary skills, machine translators are constantly getting more accurate. Ever-accelerating developments in responsive neural network technology ensures that this kind of growth isn’t likely to stop anytime soon.
So, the answer to this final question? Yes, MT is reliable and constantly becoming more so—but there’s no harm in double-checking your automatic translations with a human translator. In the end, you’ve got nothing to lose and everything to gain by going multilingual—especially if you run a business that falls into one of these six categories, are planning on internationalizing or localizing, or quite simply just want to increase your conversion rate.