Sebastian's Pamphlets

If you've read my articles somewhere on the Internet, expect something different here.

MOVED TO SEBASTIANS-PAMPHLETS.COM

Please click the link above to read actual posts, this archive will disappear soon!

Stay tuned...

Monday, June 25, 2007

Playing with Google Translate (still beta)

I use translation tools quite often, so after reading Google's Udi Manber - Search is a Hard Problem I just had to look at Google Translate again.

Under Text and Web it offers the somewhat rough translations available from the toolbar and links on SERPs. Usually, I use that feature only with languages I don't speak to get an idea of the rough meaning, because the offered translation is, well, rough. Here's an example. Translating "Don't make a fool of yourself" to German gives "einen Dummkopf nicht von selbst bilden". That means "not forming a dullard of its own volition" but Google's reverse translation "a fool automatically do not educate" is even funnier.

Coming with at least rudimentary practices in foreign languages really helps reading Google's automated translations. Quite often the translation is just not understandable without knowledge of the other language's grammar and distinctiveness. For example my french is a bit rusty, so translating Le Monde to english leads to understandable text I can read way faster than the original. Italian to English is another story (my italian skills should be considered "just enough for tourists"), for example the frontpage of la Repubblica is, partly due to the summarizing language, hard to read in Google's english translation. Translated articles on the other hand are rather understandable.

By the way, the quality of translated news, technical writings or academic papers is much better than rough translations of everyday language, so better don't try to get any sense out of translated forum posts and stuff like that. Probably that's caused by the lack of trusted translations of these sources which are necessary to train Google's algos.

Google Translate fails miserably sometimes. Although arabic-english is labelled "BETA", it cannot translate even a single word from the most important source of news in arabic, Al Jazeera - it just delivers a copy of the arabic home page. Ok, that's a joke, all the arabic text is provided on images. Translations of Al Jazeera's articles are terrific, way better than any automated translation from or to european languages I've seen, ever. Comparing Google's translation of the Beijing Review to the english edition makes no sense due to sync issues, but the automated translation looks great, even the headlines make sense (semantically, not in their meanings - but what do I know, I'm not a stalinistic commie killing and jailing dissidents practicing human rights like the freedom of speech).


On the second tab Google translates search results, that's a neat way to research resources in other languages. You can submit a question in english, Google translates it on the fly to the other language, queries the search index with the translated search term and delivers a bilingual search result page, english in the left column and the foreign language on the right side. I don't like that the page titles are truncated, also the snippets are way too short to make sense in most cases. However, it is darn useful. Let's test how Google translates her own pamphlets:

A search in english for [Google Webmaster guidelines] on german pages delivers understandable results. The second search result, "Der Ankauf von Links mit der Absicht, die Rangfolge einer Website zu verbessern, ist ein Verstoß gegen die Richtlinien für Webmaster von Google", gets translated to "The purchase from left with the intention of improving the order of rank of a Website is an offence against the guidelines for Web master of Google". Here it comes straight from the horse's mouth: Google's very own Webmasters must not sell links on the left sidebar of pages on Google.com. I'm not a Webmaster at Google, so in my book that means I can remove the crappy nofollow from tons of links as long as I move them to the left sidebar. (Seriously, the german noun for "link" is "Verbindung" respectively "Verweis", which both have tons of other meanings besides "hyperlink", so everybody in Germany uses "Link" and the plural "Links", but "links" means "left" and Google's translator ignores capitalization as well as anglicisms. The german translation of "Google's guidelines for Webmasters" as "Richtlinien für Webmaster von Google" is quite hapless by the way. It should read "Googles Richtlinien für Webmaster" because "Webmaster von Google" really means "Webmasters of Google" which is (in German) a synonym for "Google's [own] Webmasters".)

An extended search like [Google quality guidelines hidden links] for all sorts of terms from the guidelines like "hidden text", "cloaking", "doorway page" (BTW why is the page type described as "doorway page" in reality a "hallway page", and why doesn't explain Google the characteristics of deceitfully doorway pages, and why doesn't Google explain that most (not machine generated) doorway pages are perfectly legit landing pages?), "sneaky redirects" and many more did not deliver a single page from google.de on the first SERP. No wonder that german Internet marketers are the worst spammers on earth when Google doesn't tell them what particular techniques they should avoid. Hint for Riona: to improve findability consider adding these tags untranslated to all versions of the help system in foreign languages. Hint for Matt: please admit that not each and every doorway page is violating Google's guidelines. A well done and compelling doorway page just highlights a particular topic, hence from a Webmaster's as well as from a search engine's perspective that's perfectly legit "relevance bait" (I can resist to call it spider fodder because it really ain't that in particular).

Ok, back to the topic.


I really fell in love with the recently added third tab Dictionary. This tool beats the pants off Babylon and other word translators when it comes to lookups of single words, but it lacks the reverse functionality provided by these tools, that is the translations of phrases. And it's Web based, so (for example) a middle mouse click on a word or phrase in any application except of my Web browser with Google's toolbar enabled doesn't show the translation. Actually, the quality of one-word lookups is terrific, and when you know how to search you get phrases too. Just play and get familar with it, then when you've at least a rudimentary understanding of the other language you'll often get the desired results.

Well, not always. Submitting "schlagen" ("beat") in German-English mode when I search for a phrase like "beats the pants off something" leads to "outmatch" ("übertreffen, (aus dem Felde) schlagen") as best match. In reverse (English-German) "outmatch" is translated to "übertreffen, (aus dem Felde) schlagen" without alternative or supplemental results, but "beat" has tons of german results, unfortunately without "beats the pants off something".

I admit that's unfair, according to the specs the dictionary thingy is not able to translate phrases (yet). The one-word translations are awesome, I just couldn't resist to max it out with my tries to translate phrases. Hopefully Google renames "Dictionary" to "Words" and adds a tab "Phrases" soon.

Labels: , ,

Share this post at StumbleUpon
Stumble It!
    Share this post at del.icio.us
Post it to
del.icio.us
 


-->

2 Comments:

  • At Tuesday, July 03, 2007, Blogger Tony Ruscoe said…

    By the way, the quality of translated news, technical writings or academic papers is much better than rough translations of everyday language, so better don't try to get any sense out of translated forum posts and stuff like that. Probably that's caused by the lack of trusted translations of these sources which are necessary to train Google's algos.

    It's worth noting that Google hasn't developed the technology behind any of their non-Beta language pairs; they simply use Systran.

    And the reason why translations of technical writings and academic papers are more accurate is because Systran's rule-based translation engine obviously handles well-formed text better than text that's full of errors (like forum posts and comments, for example) because it follows these rules.

     
  • At Tuesday, July 03, 2007, Blogger Sebastian said…

    Tony, thanks for the insight from an expert. I thought Franz Och's team developed the technology.

     

Post a Comment

<< Home