Always define the language and the direction of your HTML documents, part 01

I received this email from Safari Books Online:

Email in English from Safari Books, oriented like Hebrew
Email in English from Safari Books, oriented like Hebrew. Click to enlarge.

The email is written in English, but notice how the text is aligned unusually to the right. Notice also that the punctuation marks appear at the wrong end of the sentence. I used Firefox developer tools to apply the correct direction, and saw it correctly:

The same email, with corrected left-to-right formatting using Firefox developer tools
The same email, with corrected left-to-right formatting using Firefox developer tools

This happens because I use GMail with the Hebrew interface. GMail has to guess the direction of the emails that I receive, because in plain text there’s no easy way to specify the direction (I hope to discuss it in a separate post soon). Usually GMail guesses correctly. Ironically, for HTML-formatted emails like this one, GMail often guesses incorrectly, even though in HTML, unlike in plain text, it’s quite easy to specify the direction by simply adding dir=”ltr” to the root element of the email.

Unfortunately a lot of HTML authors don’t bother to specify explicit direction. Many are not even aware of this exotic dir attribute. Others think that because “ltr” is the default, they don’t have to specify it. They are wrong: As this email shows, the left-to-right HTML content is embedded in a right-to-left environment, and the “rtl” definition propagates to the embedded content.

You could blame GMail, of course, but it’s much more practical to always define the direction of your HTML content, even if it’s the default. You can never know where will your content end up.

P.S.: I read this post before publishing and suddenly realized that its style is quite similar to “Best Practices” books, such as Damian Conway’s classic “Perl Best Practices” – it tells you to do something that is not obviously needed, and explains why it is needed nevertheless. I like to acknowledge sources of inspiration. Thank you, Damian.

Turkic Wikimedia Conference 2012, Almaty: Other Highlights and Summing Up

Other highlights

Of course, the Turkic Wikimedia Conference had many other highlights except my talks and workshops. Jonas Öberg from Creative Commons delivered a keynote speech about the importance of letting people freely share their works, especially with regards to cultures which are not as known as the American or the Western European, such as that of Kazakhstan. Basically, anybody who is curious about the culture of Kazakhstan will only be able to know about it the things that are freely posted online. If it’s gathering dust in the library or locked behind a password in a pay-to-read website, nobody will read it.

Jonas Öberg. By: Ashina. License: CC-BY-SA 3.0.
Jonas Öberg. By: Ashina. License: CC-BY-SA 3.0.

The Wikimedian Daniel Mietchen, who is an advocate for Open Science, convincingly explained why opening up academic articles and experiments will not just make them cheaper, but also more correct scientifically.

Daniel Mietchen
Daniel Mietchen

Daniel also impressed lots of people with his Russian speaking skills: Apparently, he grew up in East Germany, where all children had to study Russian in schools, and he was one of the few children who actually bothered to learn it well. He said that at first he didn’t like to be forced to learn a language that wasn’t useful to him, but when he had to read a book of prose – The Tales of the Late Ivan Petrovich Belkin – as homework, he found it very satisfying, even though it was very hard in the beginning.

Another highlight was a book about editing Wikipedia given to me by one of its authors Irada Alakbarova, a participant from Azerbaijan. It is similar in content and scope to the book written by the French Wikimedians Guillaume Paumier and Florence Devouard, but it’s impressive that Irada is not just an enthusiastic Wikimedian, but also a department head in the Information Technology Institute of the Azerbaijan Academy of Sciences, and the book’s other author Rasim Aliquliyev is the Institute’s director. (In precise Azeri spelling their names are İradə Ələkbərova and Rasim Əliquliyev. The letter Ə is a part of Azerbaijan’s Latin-based writing system, but looks too weird to many English readers.)

İradə Ələkbərova
İradə Ələkbərova

Irada also told me that some time ago she gathered any information that she could about Wikipedia’s server configuration and used it as an example for teaching configuration of high-performance websites. She was very happy when I told that the Wikimedia server configuration became even more transparent recently.

Summing up

I participated in many conferences lately, and this one was unusually satisfying in many ways.

As usual, meeting the people was the best part. This refers both to the people from places like Bashkortostan and Sakha, with whom I communicated by email for many years, hardly imagining how do they look, and also to people whom I had not known before and who came from countries that I could hardly imagine of ever visiting, like Kyrgyzstan and Turkmenistan. The international press mostly reports bad and weird news from these countries, but as it often happens, the image created by the media has little to do with the real people – I was stunned by the talent, the originality and the vigor that they demonstrated.

I was not the only one who felt that the conference was a great success, so we already started to throw around ideas for the location of another one. The names of Bishkek, Ufa, Baku and Istanbul were suggested, and I would certainly be very happy to go to any of these cities or to meet these wonderful people elsewhere.

Most importantly, this conference left me and the other participants a long list of exciting tasks to do.

What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.


Tilek Mamutov, giving a talk about Google Translate
Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…

Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.

What do the people want? Part 1: Internationalized templates

The technical issue that the participants of the Turkic Wikimedia Conference in Almaty asked me about more than anything else is migrating templates from bigger Wikipedias. MediaWiki Templates are one of the most important tools for writing Wikipedia articles more effectively and for making them informative, eye-pleasing and easier to read.

For the article writers, however, templates are a nightmare. Their syntax is horrible and unreadable; it’s hard to write them, hard to use them and hard to modify them. The fact that so many people nevertheless do it is quite astounding.

The guts of Template:Infobox in the English Wikipedia
The guts of Template:Infobox in the English Wikipedia. This is the horrible, unreadable and unmaintainable code behind the nice boxes that you see on the sides of Wikipedia articles.

The English and the Russian Wikipedias have thousands of templates that would be just as useful in any language. Unfortunately, actually re-using them in other languages is very hard: each template must be manually copied and translated; templates that are nested in this template must be manually copied one-by-one recursively; and if the template in the original language was updated, it must be updated manually again. (The same problem pertains to MediaWiki gadgets, such as Twinkle and RefToolbar.)

This problem is not new. MediaWiki developers are more or less aware of it, and over the years they have been trying to solve it in various ways, but until now this didn’t actually happen. A partial solution may come from the Wikidata project, but it is just beginning. Also, some time soon the Lua programming language may became usable as the new template language that will gradually replace all those curly brackets. However, that will take time, too, and by itself it will only improve the readability of the syntax and maybe the performance, but it won’t provide an easy solution for internationalization.

All I could say at this point is that I’ll try to pass the word on and remind the developers of the importance of this issue.

To be continued…

Turkic Wikimedia Conference 2012, Almaty: Master Class, Kazakh in China and Developers’ Workshop

The “master class”

On the morning of the second day of the Turkic Wikimedia Conference 2012 I held a workshop. The participants called it a “master class” and I didn’t object :)

People sitting on benches. Amir Aharoni operating a notebook and a projector
Doing a "master class" in

In the master class I demonstrated how to translate Wikimedia software. People opened accounts and started translating MediaWiki and the Wikipedia Mobile app. During the master class several issues were raised. Some of them turned out to be technical issues of I intent to find a solution soon.

Language support for Kazakh speakers in China

After the master class I had a relatively short, but really fantastic meeting with Akytbek, a Kazakh speaker from North-Western China. He told me that two million Chinese Kazakhs are well-connected to the Internet and that they vigorously use the Kazakh language online. (According to official Chinese data, there are 1.25 millions Kazakhs in China, but whatever the number is, it’s a lot of people.) That is good, of course, but they only do it only in the Arabic alphabet, and not the Cyrillic, which is used in Kazakhstan. He said that there is a great potential of having many Chinese Kazakh contributors to Wikipedia, and that even though the Kazakh Wikipedia already supports the Arabic script, some improvements are needed to realize this potential.

People sitting together on benches and looking on a laptop computer
Working with Akytbek from China on Arabic script support for the Kazakh Wikipedia

I showed Akytbek our current language tools – the automatic script conversion, WebFonts and the Narayam typing tool, and we decided to work together to adapt them better for the needs of Chinese Kazakhs.

By the way, Akytbek didn’t speak any Russian and he knew little English, so another Kazakh speaker who knew Russian acted as an interpreter. This is yet another proof of the importance of never assuming anything about languages and people.

MediaWiki development workshop

According to the schedule, the same morning I was also supposed to hold a workshop for programmers that would introduce them to MediaWiki development. The workshop did not take place at its scheduled time – network problems spoiled the opportunity. However, as it is so important, we did not give up and held it later at the hotel where we were staying.

It was intense, and intensely good, too: Talented and experienced people from Turkmenistan, Kyrgyzstan, Bashkortostan and Kazakhstan sat and listened to me talking for two hours or so about MediaWiki configuration, special pages, i18n files, installation procedures, extensions, preferences, templates, bots, source control and so on. Because of the quality of the questions, I am sure that my presentation was understood. What made me really happy is that several people asked how they could contribute patches and new features.

To be continued…

Turkic Wikimedia Conference 2012, Almaty: Chapters and Walks in the Park

Wikimedia Chapters – how to organize Wikipedians better and do cool things

At the Turkic Wikimedia Conference 2012 my second talk was about local Wikimedia chapters. That is a somewhat surprising topic, because chapters are separate from the Foundation, which I came to represent, but apparently there was great demand for it among the participants: The organizers asked me to do this a few days before the conference, and in the opening mingling before the actual conference program started people from Azerbaijan, Turkey and other countries asked me about this. So it was clear to me that such a talk would have value and that it can contribute to the development of the local communities.

To make sure that people from Turkey would understand me, I wrote bilingual Russian and English slides. I explained what chapters are (and what they aren’t), what they do, and how they are funded. I also added a few colorful slides from a presentation about the chapters’ activities, which Lodewijk Gelauff made in Wikimania 2011 in Haifa (thank you so much, Lodewijk).

People in the audience asked whether it’s possible to have a local chapter in a country that already has a national chapter – something that is very relevant to Russia, which is the biggest country in the world and which has many regions with diverse cultures. I replied that it’s basically possible (see Wikimedia New York City), but should be discussed with the Foundation and the national chapter. People also asked about funding – how “non-profit” must a chapter be? Can it, for example, provide services that are related to the Wikimedia mission for a fee and use the income only to advance the same mission? I am not a lawyer but it may be possible. It is also something that should be discussed with the Foundation and that it also depends on the laws pertaining to non-profit organizations of each country.

Walking in the park, talking about… software localization

The monument in the Panfilovtsy park in Almary. It's very Soviet, but mostly in a good way.
The monument in the Panfilovtsy park in Almary. It's very Soviet, but mostly in a good way. Photo by Roman Plischke, licensed under CC-BY-SA 3.0.

In the evening of the first day I had a walk in the Panfilovtsy park with several participants and had very interesting talks about Open Science and about software localization. I was very pleasantly surprised by the fact that people in Kyrgyzstan are so well-familiar with localization platforms like Pootle, Google Translator Toolkit, GlotPress, with the localization sites of Facebook and Twitter and even with localizing mobile phones.

I was less pleasantly surprised by the fact that the same people didn’t know anything about, Wikimedia’s main localization website, which can do things that are very similar to the above-mentioned products, and in many cases it can even do it better. This means that we have to work more to publicize it.

To be continued…

Turkic Wikimedia Conference 2012, Almaty – intro

The first Turkic Wikimedia Conference in Almaty, the largest city of Kazakhstan was held last weekend.

Turkic Wikimedia Conference 2012 logo by Batyr Hamzauly. The design of the "TWC" letters is based on Old Turkic runes. licensed under CC-BY-SA 3.0.
Turkic Wikimedia Conference 2012 logo by Batyr Hamzauly. The design of the "TWC" letters is based on Old Turkic runes. licensed under CC-BY-SA 3.0.


“Turkic” refers to Turkic languages. The most prominent Turkic language, in terms of number of speakers and international awareness of its existence is Turkish, the main language of Turkey. There are, however, many more such languages; Most of them are spoken in Russia and other countries of the former Soviet Union, and a few are spoken in China, Afghanistan and other countries.

There first sign of this conference was given in Jimmy Wales’ closing speech of Wikimania 2011 in Haifa. By coincidence, in the same speech Jimmy’s pointer broke down, so I came up on the stage to push the button that moves the slides for him. At some point he asked me not to go too fast, and then he praised Rauan Kenzhekhanuly – the head of WikiBilim, a Kazakh association of people who contribute to Wikipedia, which expanded the Kazakh Wikipedia by many thousands of articles. He was so impressed by their activities that he promised his support for holding a regional Wikimedia conference, and now it happened.

Even though Russian is not a Turkic language, it is the most common language for the majority of the conference participants. As I am one of the few Wikimedia Foundation who speaks it, I was invited there.

Why and how to write Wikipedia in your language

At the conference I delivered several talks. The first was one of the opening keynote speeches – “Why you should write Wikipedia in your language”. In the talk I repeated my usual thesis – writing content and developing software in your native language rather than in a major language is important not just because of nationalism, politics or ideology, but simply because many people don’t know major international languages and thus they cannot access information if it’s written only in a language they don’t know. Before this talk I was told that even in Kazakhstan, where most people know Russian, native Kazakh language speakers often find it easier and more natural to read in Kazakh, especially when it comes to textbooks in schools and universities, and this went along perfectly with what I tried to present.

In that talk I also mentioned practical things that can help people to write in their languages and to join the global Wikimedia community – our mailing lists and our language support tools.

To make it more entertaining and memorable, I said a few words in Hebrew to give the audience the feeling of bewilderment when encountering a foreign language, and told people to stand up and sit down if they know this or that language. Beyond having fun, this little game also had a practical purpose: I delivered most of this talk in Russian and I wanted to make sure that everybody understands me. People from Turkey and other non-Russian-speaking countries were present in the audience and even though there was simultaneous translation into English, I wasn’t completely sure that they understand me. People laughed and applauded, so I guess that it worked.

My answer to the question “will Wikipedia ever carry advertising” was “NO”. This also received thunderous applause.

To be continued…