Archive for the 'language' Category

Hugo Chávez Is Still Not Dead

There are articles about Chávez in Wikipedias in ninety-six languages. He’s still not dead according to thirteen of them:

  1. Cantonese (about the language) – FIXED
  2. Central Bikol (about the language) – FIXED
  3. Ido (about the language) – FIXED
  4. Ladino (about the language) – FIXED
  5. Min Nan (about the language)
  6. Ossetic (about the language) – FIXED
  7. Papiamento (about the language) – FIXED
  8. Samogitian (about the language) – FIXED
  9. Sicilian (about the language) – FIXED
  10. Somali (about the language)
  11. Upper Sorbian (about the language) – FIXED
  12. Võro (about the language) – FIXED
  13. Walloon (about the language) – FIXED

Looking at the different language Wikipedias often brings about other useful things. For example, Chávez’ death date was marked in the Manx Wikipedia, but the name of the month of March was spelled incorrectly, so I corrected it. In the Russian Wikipedia I noticed that the banner that invites people to Wikimania 2013 in Hong Kong is translated incorrectly, and I corrected it.

If you know one of the above languages, consider adding the death date of Hugo Chávez to the articles, and writing some other things there, too. Millions of people will appreciate your contribution.

Yakutsk 2012

When I was about five years old, I saw a map of the world on the wall of my Moscow home. I noticed that the USSR is very, very big. And that it has a lot of rivers, like Ob, Yenisey, and Lena. “Lena”, I thought, “How nice. Like a name of a girl.”

On the Lena river I saw a city called Yakutsk. The name sounded a bit funny to me, but I became curious about it somehow.

And last month I went there.


Yakutsk is the capital of the Sakha Republic, also known as Yakutia – the largest administrative region in the world that is not a country. The largest native ethnic group of Sakha, after which the republic is named, speak a Turkic language of the same name, although it is also frequently called “Yakut”. Even though I spent almost all of my Soviet life in Moscow, I was always very curious about all the other regions and languages of the USSR, so when I discovered Wikipedia, I devoted a lot of time to reading about them and to visiting Wikipedias in these languages, even though I cannot really read them.

A request to start a Wikipeda in Sakha was filed in 2006, and I was quick to support it. After a few months of preparations it was opened. It is now one of the relatively more active Wikipedias in languages of Russia – it has over 8,000 articles, and for a minority language, most speakers of which are bilingual in another major language, this is a good number.

I kept constant and positive contact with Nikolai Pavlov – the founder and the unofficial leader of the Sakha Wikipedia – since the very start of this Wikipedia. It was great to give these people technical and organizational advice: how to write articles effectively, how to choose topics, how to organize meet-ups of Wikipedians. For a long time I dreamt of meeting them in person, but because Yakutsk is so far away from practically any other imaginable place, I didn’t think that it will ever happen. But in April 2012 I met Nikolai at the Turkic Wikimedia Conference in Almaty, Kazakhstan.

A few days after that conference Nikolai suggested that I submit a talk for an IT conference in the North-Eastern Federal University in Yakutsk. At first I thought that I’m not really related to it, but after reading the description, I decided to give it a try and wrote a talk proposal about my favorite topics: MediaWiki and Software Localization. Somewhat surprisingly, the talks were accepted and I received an invitation to present at that conference.

With Nikolai Pavlov, also known as Halan Tul. The unofficial leader of the Sakha Wikipedia and the excellent organizer of my trip to Yakutsk.

With Nikolai Pavlov, also known as Halan Tul. The unofficial leader of the Sakha Wikipedia and the excellent organizer of my trip to Yakutsk.

I flew from Tel-Aviv to Moscow, and then six more hours from Moscow to Yakutsk. Yakutsk is apparently a modern, bustling and developed city, but with interesting twists. Most notably, because it is in the permafrost area, all the houses are built on piles and all the pipelines are above ground. But actually this is just a small detail, because the general feeling is that it was a whole different country from the European part of Russia, to which I was used, and in a very good way.

I am standing on a new bridge being built

I am standing on a new bridge being built

I was most pleasantly surprised by the liveliness of the Sakha language: practically all people there know Russian, but the Sakha speech is frequently heard on the streets, Sakha writing is frequently seen on advertising and store signs, and Sakha songs are played from many passing cars.

Myself standing in front of a classroom, speaking about MediaWiki

Speaking about MediaWiki in Yakutsk

The conference was very varied – with presenters from South Korea, China, Bulgaria, Switzerland and major Russian cities – Moscow, St. Petersburg and others. The topics were very varied, too, but the central topic was using computer technologies for education and human development, so I felt that my talks about Wikipedia and software localization were fitting.

I am standing holding a microphone in front of an audience in a university auditorium. Behind me - a screen with a GNU head, the logo of the Free Software Foundation.

Presenting my main plenary lecture about software localization. One of my main points is that using Free Software, represented by the GNU head, is very easy to internationalize.

Except participating in the conference itself, I also attended many meetings that Nikolai organized for me. It was fascinating to meet all these people.

Meeting the manager of Bichik, the national book publisher. On the wall - portraits of notable Sakha writers.

Meeting the manager of Bichik, the national book publisher. On the wall – portraits of notable Sakha writers.

I spoke to the editor and the manager of the republic’s largest book publishing company – they told me that the local literature has great artistic value, but since less than half a million people speak this language, it’s hard to earn a lot of profit from it and to develop it. They also complained that some authors – as well as some deceased authors’ families – are too harsh about copyrights. I suggested them to try to talk with authors and release some works under the Creative Commons license and see whether it gets them more exposure, and they promised to read Lawrence Lessig’s “Free Culture” book.

I am sitting in a classroom and speaking to a group of about ten people.

Meeting Yakutsk linguists and explaining them how putting their works on Wikipedia will make them much more accessible to the whole world.

I also met with linguists from the university, who work on researching and documenting the Sakha language and other languages of the region, such as Evenki and Yukagir. I suggested them to use Wikimedia resources for storage and documentation of the works they gather, and they liked the idea; I am definitely going to follow up with them on that.

In the offices of Ykt.ru, with the manager of the company - and a Kanban board in the background.

In the offices of Ykt.ru, with the manager of the company – and a Kanban board in the background.

Another great meeting I had was with local tech people – a community of proud local IT geeks, who had lots of ideas for promoting Wikipedias in regional languages, and also the management and the employees of the local Internet portal ykt.ru. Their offices look just like a building of a hi-tech company in the Silicon Valley or in Israel – with cozy rooms and lounges, and a Kanban board. The people made an excellent impression on me, too: we had a very professional and engaging conversation about developing web applications and agile management methodologies.

I am sitting on a couch and the TV crew prepare my microphone for the interview

Preparing for an interview at NVK, the national TV station

I also spoke to several journalists and to the local TV and radio stations, inviting people to read Wikipedia in their own language and to contribute to it. I felt a bit like a celebrity, and well, I hope that it made somebody realize how effective can the Internet be in promoting local cultures and how proud should people be about their own languages.

One last comment is about the Sakha literature, which I mentioned earlier. I return from almost all my trips abroad with a lot of books about the local languages and cultures. And I actually read them. It happened in this trip, too, except this time most of the books were given to me as gifts by all those very nice people that I met. Sakha prose and Olonkho poetry in translation to Russian are simply wonderful. In all honesty. This is beautiful world-class literature and it deserves more exposure. If this little blog post made you curious about it, then it’s the most important thing that it could achieve.

(All photos were taken by Nikolai Pavlov, except the one in which he appears.)

The Longest Articles

In Wikipedia in every language you can go to a page called “Special:LongPages” and see what are the longest articles in that language.

Some fun facts that I found by random browsing of that page in a few languages:

  • The longest article in the Polish Wikipedia is “Finnish grammar”. It’s 117 pages long in print – basically a book.
  • The longest article in the Telugu Wikipedia is “Adolf Hitler”.
  • The longest article in the Kannada Wikipedia is “History of the SLR camera”. The second longest is “Adolf Hitler”. Kannada is spoken in India near Telugu.
  • The longest article in the Italian Wikipedia is “List of serial killers by number of victims”.
  • The longest article in the Hindi Wikipedia is “History of Australia” – about 50 pages in print. The article “History of India” will take 5 pages in print.
  • The longest articles in Chinese, Japanese and Korean Wikipedias are related to video games.
  • Finally, the longest article in the English Wikipedia is “List of Advanced Dungeons & Dragons 2nd edition monsters”.

What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.

X

Tilek Mamutov, giving a talk about Google Translate

Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…


Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.

What do the people want? Part 1: Internationalized templates

The technical issue that the participants of the Turkic Wikimedia Conference in Almaty asked me about more than anything else is migrating templates from bigger Wikipedias. MediaWiki Templates are one of the most important tools for writing Wikipedia articles more effectively and for making them informative, eye-pleasing and easier to read.

For the article writers, however, templates are a nightmare. Their syntax is horrible and unreadable; it’s hard to write them, hard to use them and hard to modify them. The fact that so many people nevertheless do it is quite astounding.

The guts of Template:Infobox in the English Wikipedia

The guts of Template:Infobox in the English Wikipedia. This is the horrible, unreadable and unmaintainable code behind the nice boxes that you see on the sides of Wikipedia articles.

The English and the Russian Wikipedias have thousands of templates that would be just as useful in any language. Unfortunately, actually re-using them in other languages is very hard: each template must be manually copied and translated; templates that are nested in this template must be manually copied one-by-one recursively; and if the template in the original language was updated, it must be updated manually again. (The same problem pertains to MediaWiki gadgets, such as Twinkle and RefToolbar.)

This problem is not new. MediaWiki developers are more or less aware of it, and over the years they have been trying to solve it in various ways, but until now this didn’t actually happen. A partial solution may come from the Wikidata project, but it is just beginning. Also, some time soon the Lua programming language may became usable as the new template language that will gradually replace all those curly brackets. However, that will take time, too, and by itself it will only improve the readability of the syntax and maybe the performance, but it won’t provide an easy solution for internationalization.

All I could say at this point is that I’ll try to pass the word on and remind the developers of the importance of this issue.

To be continued…

Turkic Wikimedia Conference 2012, Almaty: Chapters and Walks in the Park

Wikimedia Chapters – how to organize Wikipedians better and do cool things

At the Turkic Wikimedia Conference 2012 my second talk was about local Wikimedia chapters. That is a somewhat surprising topic, because chapters are separate from the Foundation, which I came to represent, but apparently there was great demand for it among the participants: The organizers asked me to do this a few days before the conference, and in the opening mingling before the actual conference program started people from Azerbaijan, Turkey and other countries asked me about this. So it was clear to me that such a talk would have value and that it can contribute to the development of the local communities.

To make sure that people from Turkey would understand me, I wrote bilingual Russian and English slides. I explained what chapters are (and what they aren’t), what they do, and how they are funded. I also added a few colorful slides from a presentation about the chapters’ activities, which Lodewijk Gelauff made in Wikimania 2011 in Haifa (thank you so much, Lodewijk).

People in the audience asked whether it’s possible to have a local chapter in a country that already has a national chapter – something that is very relevant to Russia, which is the biggest country in the world and which has many regions with diverse cultures. I replied that it’s basically possible (see Wikimedia New York City), but should be discussed with the Foundation and the national chapter. People also asked about funding – how “non-profit” must a chapter be? Can it, for example, provide services that are related to the Wikimedia mission for a fee and use the income only to advance the same mission? I am not a lawyer but it may be possible. It is also something that should be discussed with the Foundation and that it also depends on the laws pertaining to non-profit organizations of each country.

Walking in the park, talking about… software localization

The monument in the Panfilovtsy park in Almary. It's very Soviet, but mostly in a good way.

The monument in the Panfilovtsy park in Almary. It's very Soviet, but mostly in a good way. Photo by Roman Plischke, licensed under CC-BY-SA 3.0.

In the evening of the first day I had a walk in the Panfilovtsy park with several participants and had very interesting talks about Open Science and about software localization. I was very pleasantly surprised by the fact that people in Kyrgyzstan are so well-familiar with localization platforms like Pootle, Google Translator Toolkit, GlotPress, with the localization sites of Facebook and Twitter and even with localizing mobile phones.

I was less pleasantly surprised by the fact that the same people didn’t know anything about translatewiki.net, Wikimedia’s main localization website, which can do things that are very similar to the above-mentioned products, and in many cases it can even do it better. This means that we have to work more to publicize it.

To be continued…

Turkic Wikimedia Conference 2012, Almaty – intro

The first Turkic Wikimedia Conference in Almaty, the largest city of Kazakhstan was held last weekend.

Turkic Wikimedia Conference 2012 logo by Batyr Hamzauly. The design of the "TWC" letters is based on Old Turkic runes. licensed under CC-BY-SA 3.0.

Turkic Wikimedia Conference 2012 logo by Batyr Hamzauly. The design of the "TWC" letters is based on Old Turkic runes. licensed under CC-BY-SA 3.0.

Turkic?

“Turkic” refers to Turkic languages. The most prominent Turkic language, in terms of number of speakers and international awareness of its existence is Turkish, the main language of Turkey. There are, however, many more such languages; Most of them are spoken in Russia and other countries of the former Soviet Union, and a few are spoken in China, Afghanistan and other countries.

There first sign of this conference was given in Jimmy Wales’ closing speech of Wikimania 2011 in Haifa. By coincidence, in the same speech Jimmy’s pointer broke down, so I came up on the stage to push the button that moves the slides for him. At some point he asked me not to go too fast, and then he praised Rauan Kenzhekhanuly – the head of WikiBilim, a Kazakh association of people who contribute to Wikipedia, which expanded the Kazakh Wikipedia by many thousands of articles. He was so impressed by their activities that he promised his support for holding a regional Wikimedia conference, and now it happened.

Even though Russian is not a Turkic language, it is the most common language for the majority of the conference participants. As I am one of the few Wikimedia Foundation who speaks it, I was invited there.

Why and how to write Wikipedia in your language

At the conference I delivered several talks. The first was one of the opening keynote speeches – “Why you should write Wikipedia in your language”. In the talk I repeated my usual thesis – writing content and developing software in your native language rather than in a major language is important not just because of nationalism, politics or ideology, but simply because many people don’t know major international languages and thus they cannot access information if it’s written only in a language they don’t know. Before this talk I was told that even in Kazakhstan, where most people know Russian, native Kazakh language speakers often find it easier and more natural to read in Kazakh, especially when it comes to textbooks in schools and universities, and this went along perfectly with what I tried to present.

In that talk I also mentioned practical things that can help people to write in their languages and to join the global Wikimedia community – our mailing lists and our language support tools.

To make it more entertaining and memorable, I said a few words in Hebrew to give the audience the feeling of bewilderment when encountering a foreign language, and told people to stand up and sit down if they know this or that language. Beyond having fun, this little game also had a practical purpose: I delivered most of this talk in Russian and I wanted to make sure that everybody understands me. People from Turkey and other non-Russian-speaking countries were present in the audience and even though there was simultaneous translation into English, I wasn’t completely sure that they understand me. People laughed and applauded, so I guess that it worked.

My answer to the question “will Wikipedia ever carry advertising” was “NO”. This also received thunderous applause.

To be continued…

Ones and O’s: The Advantages of Digital Texts in Wikisource

I’ve been asked what the advantages are of using Wikisource over simply uploading scanned books to a website. The people who asked me about this speak languages of India, but my replies apply to all languages.

First, what is Wikisource? It’s a sister project of Wikipedia, which hosts freely-licensed documents that were already published elsewhere. The English Wikisource, for example, hosts many books that passed into the public domain, such as Alice in Wonderland, the Sherlock Holmes stories and Gesenius’ Hebrew Grammar (my favorite pet project). It also hosts many other types of texts, for example speeches by US presidents from Washington to Obama, because according to the American law they are all in the public domain.

And now to the main question: Why bother to type the texts letter-by-letter as digital texts rather than just scanning them? For languages written in the Latin, Cyrillic and some other scripts this question is less important, because for these scripts OCR technology makes the process half-automatic. It’s never fully automatic, because OCR output always has to be proofread, but it’s still makes the process easier and faster.

For the languages of India it is harder, because as far as i know there’s no OCR software for them, so they have to be typed letter-by-letter. This is very hard work. What is it good for?

In general, an image of a scanned page is a digital ghost: It is only partially useful to a human and it is almost completely useless to a computer. A computer’s heart only beats ones and O’s – it usually doesn’t care whether an image shows a kitten or a text of a poem.

It’s possible – and easy – to copy a digital text

It’s almost impossible to copy text from a scanned image. You can, of course, use some graphics editing software to cut the text and paste it as an image in your document, but that is very slow and the quality of the output will be bad. Why is it useful to copy text from a book that was already published? It’s very useful to people who write papers about literary works. This happens to all children who study literature in their native language in school and to university students and researchers in departments of language and literature. It is also useful if you want to quickly copy a quote from a book to an email, a status update on a social network or a Wikipedia article. Some people would think that copying from a book to a school paper is cheating, but it isn’t; copying another paper about a book may be cheating, but copying quotes from the original book to a paper you’re writing is usually OK and a digitized book just makes it easier and helps you concentrate on the paper.

Searching

In the previous point i mentioned copying text to an email from a book. It’s easy if you know what the book is and on which page the text appears. But it’s hard if you don’t know these things, and this happens very often. That’s where searching comes in, but searching works only if the text is digital – it’s very hard for the computer to understand whether an image shows a kitten or a scanned text of a poem, unless a human explains it. (OCR makes it only slightly easier.)

Linking

The letters “ht” in “http” and “html”, the names of the central technologies of the web, stand for “hypertext”. Hypertext is a text with links. A printed book only has references that point you to other pages, and then you have to turn pages back and forth. If they point to another book, you’ll have to go the shelf, find it, and turn pages there. Digital texts can be very easily linked to one another, so you’ll just have to click it to see where you are referred. This is very useful in scientific books and articles. It is rarely needed in poetry and stories, but it can be added to them too; for example, you can add a footnote that says: “Here the character quotes a line from a poem by Rabindranath Tagore” and link to the poem.

Bandwidth

This one is very simple: Scanned images of texts use much more bandwidth than digital texts. In these days of broadband it may not seem very important, but the gaps between digital texts and images is really huge, and it may be especially costly, in time and in money, to people who don’t have access to broadband.

Machine Translation

The above points are relatively easy to understand, but now it starts to get less obvious. Most modern machine translation engines, such Google, Bing and Apertium rely at least partly on pairs of translated texts. The more texts there are in a language, the better machine translation gets. The are many translated parallel texts in English, Spanish, Russian, German and French, so the machine translation for them works relatively well, but for languages with a smaller web presence it works very badly. It will take time until this influence will actually be seen, but it has to begin somewhere.

Linguistic research and education

This is another non-obvious point: Digital texts are useful for linguists, who can analyze texts to find the frequency of words and to find n-grams. Put very simply, n-grams are sequences of words, and it can be assumed that words that frequently come in a sequence probably have some special meaning. Such things are directly useful only to linguists, but the work of linguists is later used by people who write textbooks for language learning. So, the better the digital texts in a language will be, the better textbooks the children who speak that language will get. (The link between advances in linguistic research and school language textbooks was found and described in at least one academic paper by an Israeli researcher.)

Language tools

Big collections of digital texts in a language can be easily used to make better language software tools, such as spelling, grammar and style checkers.

OCR

And all this brings us back to thing from which we began: OCR technology. More digital texts well help developers of OCR software to make it better, because they’ll be able to compare existing images of text with proofread digital texts and use the comparison for testing. This is a wonderful way in which non-developers help developers and vice-versa.

So these are some of the advantages. The work is hard, but the advantages are really big, even if not immediately obvious.

If you have any more questions about Wikisource, please let me know.

In praise of Wiktionary

The Wikimedia Foundation manages the servers for several projects. Wikipedia gets almost all of the attention, and the others get almost none, even though at least some deserve a lot of it.

My personal favorite is Wikisource, a collection of freely-licensed texts that were already published elsewhere. It is similar to Project Gutenberg, but with somewhat different focus and style.

A multi-volume Latin dictionary (Egidio Forcellini: Totius Latinitatis Lexicon, 1858–87) on a table in the main reading room of the University Library of Graz. Picture taken and uploaded on 15 Dec 2005 by Dr. Marcus Gossler.

A multi-volume Latin dictionary (Egidio Forcellini: Totius Latinitatis Lexicon, 1858–87) on a table in the main reading room of the University Library of Graz. Picture taken and uploaded on 15 Dec 2005 by Dr. Marcus Gossler (license: CC-BY-SA). This is the illustration in the English Wiktionary entry "dictionary".

But there’s another project, which deserves more and more attention and praise as the years go by: Wiktionary. Even though i love printed and digital dictionaries, i never became a frequent editor of Wiktionary for two reasons. The first reason is software: MediaWiki runs Wikipedia and all the other Wikimedia projects. It is quite well suited for Wikipedia, which thrives with long encyclopedic articles sorted in a very liberal tree of categories. It’s much less suited for a dictionary, which requires a rather different model of storing, linking and sorting the entries. Some attempts were made to improve this, for example, the many templates and gadgets developed locally in the English Wiktionary and the OmegaWiki project. Both of them have nice ideas that go in the right direction, but still have many implementation problems.

The second reason is problematic methodology. It’s a hard problem to explain, but i’ll try: Writing a good dictionary is a lot harder than writing a good encyclopedia. When you are writing an encyclopedia, you can base your article on one or more reliable source about the nature and the history of a certain subject. The limits of what needs to be described in an encyclopedic article, at least for important subjects and fairly well-known people, are generally easy to determine. Dictionary compilation works entirely differently: to make a good dictionary, the editor must possess a large and representative collection of texts in a given language, to find all instances of a given word, to sort them into groups and to describe the usage of the given word. Such resources are very hard to find, and there are very few people who have the needed qualification to use them well.

Despite these problems, i find myself using Wiktionary quite often. Here are a few things for which i actually use Wiktionary repeatedly and successfully:

  • English Internet acronyms: AFAICT, TTYL, IRL, FTW, AYBABTU. They often appear in emails and chat sessions, they are legitimate dictionary terms, and the Wiktionary definitions for them are usually accurate.
  • Catalan, Spanish and Italian verb conjugation tables: I learn these languages, and i find the verb conjugation tables in Wiktionary complete and very easy to use. I have no reason to think that they have mistakes.
  • Studying Dutch. I studied Dutch for a couple of months a year ago. Unfortunately i couldn’t find the time to go on with it – i hope to come back to it! – but while i did it, i intentionally tried to use the Dutch Wiktionary to find words in the translation tasks that i got as homework. I found all the needed words easily and the explanations and the translations were clear and helpful. Of course, words in homework for beginners are probably simple, but then beginners are probably the most important and frequent users of dictionaries. In any case, the Dutch Wiktionary did the job very well.

Another advantage that Wiktionary has over other paper and digital dictionaries is that it is very richly illustrated. Paper dictionaries usually have few illustrations, if at all, because they want to save paper. Commercial digital dictionaries also have few illustrations because their publishers don’t want to pay a lot of money to photographers and designers. Wiktionary doesn’t have either of these problems: Wikipedia is very richly illustrated thanks to the enormous amount of images contributed by people and Wiktionary has direct and easy access to the Wikimedia Commons – the same repository of Free images, sounds and video that is used by Wikipedia. And of course, Wiktionary is not made of paper.

So there: Wiktionary may still not be as strong as Wikipedia in completeness and in popularity, but it definitely deserves attention. And the people who work on it despite the enormous difficulties deserve a lot of praise.

Mongol Bichig, or why Microsoft Internet Explorer is better than Firefox, Chrome and Opera

After writing this post I found out that Google Chrome, in fact, does support vertical Mongolian text.

The title of this post is designed to catch the eye. Microsoft Internet Explorer is not better than Firefox, Chrome and Opera – it’s worse than them in every imaginable regard.

Except one: the support for Mongol Bichig, the vertical Mongolian script.

Text in vertical Mongolian

Text in vertical Mongolian

Mongolian script is unique: its letters are connected, similarly to Arabic and its lines are written vertically. About three million Mongols in the independent republic of Mongolia use this script mostly for historical purposes, and use the Cyrillic script in their daily life, but the classical vertical script is the regular script for nearly six million Mongols in China – that’s about twice as much people.

The only browser that is able to display the vertical Mongolian script is Microsoft Internet Explorer. I don’t really know why Microsoft bothered to do it; maybe because the government of the People’s Republic of China demanded it. If that is true, then i salute the government of the People’s Republic of China. And i definitely salute Microsoft. I don’t like Microsoft’s insistence on keeping their code proprietary, but pioneering the support for this script, or any other, is praiseworthy.

I am very sad that at this time i cannot recommend my Mongolian friends to use my favorite browser, Firefox, or other modern browsers such as Google Chrome and Opera. For all their modernity, speed, feature richness and standards compliance, they are useless to over six million people who want to read and write in the vertical Mongolian script. At most, these browsers can display the script horizontally and with some letters incorrectly rendered. This also means that the only useful operating system for these people is Microsoft Windows.

One explanation that i heard for not supporting the vertical Mongolian script is that the CSS writing modes standard is not completely defined. This is actually a good and even noble reason, but when the most basic ability to read a language is in question, experimental support is better than no support.

So, which modern free browser will be the first to support the Mongolian script? I guess that it will be Firefox, given its excellent track record in supporting Unicode, and that Google Chrome will follow it after three years or so. But if Chrome developers surprise me and get there first, i’ll be just as happy. In any case, i am waiting impatiently, along with more than six million Mongols.

* * *


A completely unrelated postscript, intentionally hidden here, feel free to stop reading now: This morning i woke up to find that my Planet Mozilla feed was filled with reactions to a post by Gervaise Markham a.k.a. Gerv, in which he advocated keeping marriage defined as a union between a man and a woman, essentially opposing gay marriage. A lot of people were angry that anti-gay comments appear in a Mozilla-related feed and a lot of people were angry that anything off-topic appears there. Some people supported Gerv in different ways.

Gerv is a very well-known and very talented Mozilla programmer, and also a devout Christian. His blog is called “Hacking for Christ”. There’s nothing weird or wrong about it – there are many other excellent Christian hackers, like Perl’s Larry Wall and Jonathan Worthington and Mozilla’s Jonathan Kew. Gerv’s comment wasn’t particularly hateful; as it often goes, it focused on the legal side of things. Gerv is also an unusually charming person; i had the pleasure to meet him in Berlin.

All that said, i support gay marriage, i don’t support Gerv’s comment and i think that he shouldn’t have post it that way. But once he did, hey – water under the bridge. I care much more about his contributions to Mozilla’s code than about his social, legal and religious opinions.

And the loveliest part of it all is that in one the many comments to his post, i found a link to the play “8″, about the fight for recognizing gay marriage in California. On one hand, it’s a very well played PR stunt, with the highest league stars such as like Brad Pitt, George Clooney, Martin Sheen, Jamie Lee Curtis, Kevin Bacon, Yeardley Smith, John C. Reilly and George Takei. On the other hand, it’s actually worth watching. If this is what came out of that poorly placed blog post, then i’m not complaining.



Follow

Get every new post delivered to your Inbox.

Join 1,392 other followers