Archive for May, 2012

Turkic Wikimedia Conference 2012, Almaty: Other Highlights and Summing Up

Other highlights

Of course, the Turkic Wikimedia Conference had many other highlights except my talks and workshops. Jonas Öberg from Creative Commons delivered a keynote speech about the importance of letting people freely share their works, especially with regards to cultures which are not as known as the American or the Western European, such as that of Kazakhstan. Basically, anybody who is curious about the culture of Kazakhstan will only be able to know about it the things that are freely posted online. If it’s gathering dust in the library or locked behind a password in a pay-to-read website, nobody will read it.

Jonas Öberg. By: Ashina. License: CC-BY-SA 3.0.

Jonas Öberg. By: Ashina. License: CC-BY-SA 3.0.

The Wikimedian Daniel Mietchen, who is an advocate for Open Science, convincingly explained why opening up academic articles and experiments will not just make them cheaper, but also more correct scientifically.

Daniel Mietchen

Daniel Mietchen

Daniel also impressed lots of people with his Russian speaking skills: Apparently, he grew up in East Germany, where all children had to study Russian in schools, and he was one of the few children who actually bothered to learn it well. He said that at first he didn’t like to be forced to learn a language that wasn’t useful to him, but when he had to read a book of prose – The Tales of the Late Ivan Petrovich Belkin – as homework, he found it very satisfying, even though it was very hard in the beginning.

Another highlight was a book about editing Wikipedia given to me by one of its authors Irada Alakbarova, a participant from Azerbaijan. It is similar in content and scope to the book written by the French Wikimedians Guillaume Paumier and Florence Devouard, but it’s impressive that Irada is not just an enthusiastic Wikimedian, but also a department head in the Information Technology Institute of the Azerbaijan Academy of Sciences, and the book’s other author Rasim Aliquliyev is the Institute’s director. (In precise Azeri spelling their names are İradə Ələkbərova and Rasim Əliquliyev. The letter Ə is a part of Azerbaijan’s Latin-based writing system, but looks too weird to many English readers.)

İradə Ələkbərova

İradə Ələkbərova

Irada also told me that some time ago she gathered any information that she could about Wikipedia’s server configuration and used it as an example for teaching configuration of high-performance websites. She was very happy when I told that the Wikimedia server configuration became even more transparent recently.

Summing up

I participated in many conferences lately, and this one was unusually satisfying in many ways.

As usual, meeting the people was the best part. This refers both to the people from places like Bashkortostan and Sakha, with whom I communicated by email for many years, hardly imagining how do they look, and also to people whom I had not known before and who came from countries that I could hardly imagine of ever visiting, like Kyrgyzstan and Turkmenistan. The international press mostly reports bad and weird news from these countries, but as it often happens, the image created by the media has little to do with the real people – I was stunned by the talent, the originality and the vigor that they demonstrated.

I was not the only one who felt that the conference was a great success, so we already started to throw around ideas for the location of another one. The names of Bishkek, Ufa, Baku and Istanbul were suggested, and I would certainly be very happy to go to any of these cities or to meet these wonderful people elsewhere.

Most importantly, this conference left me and the other participants a long list of exciting tasks to do.

What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.

X

Tilek Mamutov, giving a talk about Google Translate

Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…


Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.



Follow

Get every new post delivered to your Inbox.

Join 1,705 other followers