What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.

X

Tilek Mamutov, giving a talk about Google Translate
Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…


Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.