Is Baidu NMT better than Google and why?

When I originally did my research on this topic I was thinking that creating such an article would be a piece of cake. I imagined it would just be something like running a few phrases in both engines and then seeing what the output was. Adding some explanations from our colleagues from Asia would seal the deal, right?

I should have known better from my experience with Asian languages already…

I ended up with 11 pages of examples for just a few pieces of text, so I decided to split it into a series of articles, rather than try to put everything together in a single piece.

Attention Reader! This is not science just hands-on-experience

This is the spot where we have to say we wanted to deliver real-life experience, not some theoretical explanations on how the NMT works. We’ve got other articles that cover this topic separately. So I and my colleagues Cindy (China) and Semi (S. Korea) have chosen 3 different types of texts: single words/characters, short sentences, and longer excerpts from a novel.

Short phrases and expressions

We chose to translate the following Korean phrase and its Chinese equivalent into English in both Baidu and Google translate:

Chinese phrase (source): 新冠疫情结束后,我想尽情地去海外旅行。

The result Google returns “After the COVID-19 pandemic is over, I want to travel overseas to the fullest.”

The result Baidu returns “After the end of COVID-19, I want to travel overseas.”

  • Chinese people usually use the word “coronavirus pandemic” to describe “COVID-19”, Baidu translates it into “COVID-19”, and Google translates it into “COVID-19 pandemic”, they are both correct.
  • For the word “~结束后(when…is over)”,  Baidu translates it into “after the end of ”, while Google translates it into “after…is over”, the translations are not incorrect, however, they are not natural when it comes to the sentence structure.
  • For the word “尽情地(as much as …)” in Baidu the translation is missing, while Google translates it into “to the fullest”, and it can be awkward when used in this sentence.

Second attempt to translate another short phrase from Chinese into English.

Chinese phrase (source): 我的生活每天都充满着快乐。

The result in Baidu is “My life is full of happiness every day.”

The result in Google is  “My life is full of joy every day.”.

  • The sentences translated by Baidu and Google are very similar in structure.
  • However, for the word  “快乐 (happiness)”, the platforms used different expressions to explain the meaning of the word. Baidu translates it into “happiness”, while google translates it into “joy”. Google’s translation is not incorrect, however,it can be awkward when used in this sentence.
You may also like:  What a nightmare! Translation and Localization of marketing materials in different languages

Korean phrase (source): 나는 코로나가 끝나면 마음껏 해외 여행을 다니고 싶다.

The result from Google is “I want to travel abroad as much as I want after Corona is over.”

The result from Baidu is  “When the epidemic is over, I want to travel overseas as much as I can.”

  • Korean people use the word Corona [korona] describing “COVID-19” which derives from coronavirus. Baidu translates Corona into “epidemic”, while Google translates it into Corona which is closer to what Korean people actually use. The word “epidemic” can be used for many cases not specifically for COVID-19, therefore, I would choose the word from Google for this case.
  • Another difference between Baidu and Google Translate is the order of the English sentence. Baidu places the part “when the epidemic is over~” first, while Google puts “I want to travel ~” upfront. Baidu’s translation is not incorrect. Placing conditional sentences first is typical for many Asian languages when they speak English. While Google’s English sentence is more natural when it comes to the sentence structure. This tendency can be found in Korean speaking or translating English too. We place the result in the last part of the sentence whereas English puts it in the first sentence.
  • Both Baidu and Google translate “마음껏” into “as much as”. I tried to put this single word in Baidu and Google. The result shows that Baidu translate  “마음껏” into “heartily”, while Google translate it into “to your heart’s content’. Both translations are not wrong, however, they can be awkward when used in the sentence. In this sense, both Baidu and Google translate “마음껏” nicely into “as much as”.

Longer excerpts from novels

As we already saw the performance of Chinese and Korean source phrases translated into English, for our second part of the experiment we chose to run some literature English text as source and get the Asian languages as the target.

Our choice for Chinese into English is an excerpt from the storybook Alice’s Adventures in Wonderland Lewis Carroll.

baidu vs google

baidu vs google

  • For this part “Alice…sitting by her sister on the bank”: Baidu translate it as “坐在她姐姐旁边的河岸上(Alice…sitting on the bank next to her sister)” ,that is incorrect, while Google translates it as “在岸边坐在姐姐身边(Alice…sitting beside her sister on the bank)”, it is closer to the original expression, so Google’s translation is better.
  • But for this part “`and what is the use of a book,’ thought Alice `without pictures or conversation?’”: Google translates it as ““还有什么 ” 是书的用途,”爱丽丝想,“没有图片或对话?”(‘what else’ is what books are for,’ Alice thought, “No pictures or dialogue?) ”, Cindy says “the translation is very confusing, and I could not understand the meaning of the sentence, however, Baidu’s translation “ 爱丽丝想,没有图片和对话的书有什么用呢(Alice thought, what’s the use of a book without pictures and conversations?)” is more proper and smooth, so it’s easy to understand.”
  • For the part  “…, whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies,…” Google translates this part as “是否值得费心起床采摘雏菊的乐趣,(whether it was worth the trouble of getting up and picking the daisies)” , there is a missing translation of making a daisy-chain and the sentence doesn’t read smoothly. Baidu translate is as “制作一条雏菊链的乐趣是否值得起床摘雏菊(whether the fun of making a daisy chain was worth getting up and picking daisies.)”, it performed a grammatically correct sentence and sounds natural.
You may also like:  Fixing a Low-Resource Language's Quality Issues — Burmese

In this case, it seems Baidu’s translation is more useful than Google when it comes to translating English to Simplified Chinese.I noticed that sometimes Google’s translation is better and sometimes Baidu’s translation is more natural.

For translating into Korean language we chose a paragraph from the book “To live with ghosts requires solitude” – Anne Michaels, Fugitive Pieces.

baidu vs google

baidu vs google

  • Baidu translates “trail” into “오솔길” meaning an actual narrow path of an unpaved lane, Google translates it into “흔적” meaning a mark or a series of sign left behind, which is more suitable for the context
  • For the part, “I’d finally arrived at the end of the trail, in the last place I expected to find him-not deep in the wilderness he was said to haunt, but in the dim lobby of an old hotel on the edge of a dusty desert town. “:
    the translations for both platforms are similar, however, I would say Google chooses a bit more natural word “어두컴컴한” describing tasteful way of expression of “dim”, rather than Baidu’s choice of word “어두운” meaning “dim”, “dark” and this is plain.
  • For the sentence “After hearing that I’d just missed him so many times, in so many bizarre locations, I’d begun to suspect that Caballo Blanco was nothing more than a fairy tale, a local Loch Ness monstruo dreamed up to spook the kids and fool gullible gringos. “ Cindy: I would say Baidu’s translation is slightly better. Google didn’t even form a proper sentence, and it didn’t build grammatically correct phrases; while Baidu’s translation is not perfect, it formed a grammatically correct Korean sentence.
  • I didn’t know whether to hug her in relief or high-five her in triumph. For this sentence, I also give it up for Baidu’s translation. Google forms grammatically wrong sentences where the verb doesn’t match with the subject and it definitely looks like a sentence translated by a machine. While Baidu’s translation is not perfect, its grammar is correct and easy to read.
You may also like:  Twitter – less is more. But is it, really?

The result of practice versus theory bring us to reality

Our little experiment will help everyone make their own conclusions but there are a few things that stand out.

  • Single words/characters in and out of context – this is quite typical for Chinese language for example – words can change meaning when they are in various parts of the sentence  but at the same time mean a completely different thing when taken out of context. This we noticed is present in both platforms.
  • Both engines struggle with sentence structure, which in many cases is different from what we are used to in English for example. Especially, when we do it for English into Chinese.
  • One of the biggest problems is the fact that even the best algorithm needs data to perform and a lot of MT experts when asked about Asian languages will tell you that they don’t have enough data to train engines for languages like Chinese and Korean. And in this article we haven’t even discussed Japanese…

We need to point out that all is good for personal use but when we get to the professional level though and the need to have an adequate translation quality for Chinese or Korean, it would seem that both won’t give you much of an advantage. Sometimes Baidu performs better, sometimes Google does but overall the necessity of editing is obvious.

And while for most of the Latin-based languages translations reach a near-human quality, this is not the case with Asian languages. I am aware that the hype for MT might bring wrong expectations on the table for any translation company but let’s face it: If high-resource languages are reaching a very good level of translation quality, there is a very long road ahead for Asian languages to go. If you do a bit of research and see how limited the real data for most of the Asian languages is, you’ll do the math yourself. 

Just a hint for next time… sometimes it is easier for a translator to start from scratch rather than post-edit a machine translated text.