Te Taka Keegan hits the New York Times!

3
233

(New York Times, by MIGUEL HELFT) Te Taka Keegan, a university lecturer in New Zealand, is betting that Google can help him preserve the Maori language of his ancestors.

Mr. Keegan uses a tool called the Google Translator Toolkit to upload Maori translations of English texts to Google. Others can then use those translations in their work, increasing the quantity and quality of Maori translations that are available and creating incentives for children of Maori descent to learn the language.

With this tool, we can actually uplift our language, Mr. Keegan said. For us, it is about saving our language from extinction. We are trying to help our culture survive.

The Google Translator Toolkit may be good for the culture of the Maori people, an indigenous minority group in New Zealand. Its also good for Google.

Data from the toolkit helps Google beef up its machine translation system, which I cover in an article in Tuesdays Times.

Googles machine translation system feeds on data, including the data that Mr. Keegan and others feed into the toolkit. If enough people use the service, Google will eventually have enough data to add Maori to the list of languages that Google can translate automatically. Google Translate, the companys translation tool, now speaks 52 languages, more than any of the major machine translation systems in use. In a sign of Googles ambitions, the company recently released the toolkit in 345 languages, from Abkhazian to Zulu.

The toolkit is a goldmine for sucking data, said Alon Lavie, a machine translation expert and associate research professor at the Language Technologies Institute at Carnegie Mellon University. Google can use it to collect data for language pairs that there is very little data on.

For now, the amount of data Google is getting through the toolkit pales in comparison with the massive amounts of text it can cull from the Web and other sources, like official government documents or its book scanning project, said Franz Och, a principal scientist at Google who leads the companys machine translation team. But he said that will change over time.

We hope the toolkit will be of significant usefulness at some point, he said. The data we get from the toolkit is very nice and well aligned, he said, meaning that the side-by-side translations are especially useful to Googles machine-learning algorithms.

University researchers in Wales are also using the Translator Toolkit to help increase the availability of text in the Welsh language, and Google can use the data from those efforts to improve its automatic translation into Welsh, one of the 52 languages its system can handle currently.

More background on how the Translator Toolkit can be used to help to preserve minority languages is available here.

Background on Google Maori (2.0)

Te Wananga o Aotearoa is currently working with TangataWhenua.com to update the current Google main search page as well as Picasa 3.0 and is using Google’s Toolkit to do it.

Picasa is a software application for organizing and editing digital photos, originally created by Idealab and owned by Google since 2004. “Picasa” is a blend of the name of Spanish painter Pablo Picasso, the phrase mi casa for “my house” and “pic” for pictures (personalized art).

Translating Picasa is a huge task as it consists of over 15,000 words but it is hoped that it will become a useful resource for whanau wanting to organise, edit and share their photos.

3 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.