Archive for the 'Free Software' Category

The Case for Localizing Names, part 2

My name is written Amir Elisha Aharoni in English. In Hebrew it’s אמיר אלישע אהרוני, in Russian it’s Амир Элиша Аарони, in Hindi it’s अमीर एलिशा अहरोनि. It could be written in hundreds of other languages in many different ways.

More importantly, if I fill a form in Hebrew, I should write my name in Hebrew and not in English or in any other language.

Based on this simple notion, I wrote a post a year ago in support of localizing people’s names. I basically suggested, that it should be possible to have a person’s name written in more than one language in social networks, “from” and “to” fields in email, and in any other relevant place. Facebook allows doing this, but in a very rudimentary way; for example, the number of possible languages is very limited.

Today I am participating in the Open Source Language Summit in the Red Hat offices in Pune. Here we have, among many other talented an interesting people, two developers from the Mifos project, which creates Free software for microfinance. Mifos is being translated in translatewiki.net, a software translation site of which I am one of the developers.

Nayan Ambali, one of the Mifos developers, told me that they actually plan to implement a name localization feature in their software. This is not related to software localization, where a pre-defined set of strings is translated. It is something to be translated by the users of Mifos itself. The particular reason why Mifos needs such a feature comes from its nature as microfinance software: financial documents must be filled in the language of each country for legal purposes. Therefore, a Mifos user in the Indian state of Karnataka may need to have her name written in the software in English, Hindi, and Kannada – different languages, which are needed in different documents.

A simple sketch of database structure for storing names in multiple languages

A simple sketch of database structure for storing names in multiple languages

Such a feature is quite simple to implement. In the backend this means that the name must be stored in a separate table that will hold names in different languages; see the sketch I made with Nayan above. On the frontend it will need a widget for adding names in different languages, similar to the one that Wikidata has; see the screenshot below.

The name of Steven Spielberg in many languages in Wikidata, with an option to add more languages

The name of Steven Spielberg in many languages in Wikidata, with an option to add more languages

Of course, there’s also the famous problem of falsehoods that programmers believe about names, but this would be a good first step that can provide a good example to other programs.

Broken right-to-left writing in the new GMail compose interface

Shalom.

Dear Google, this is a cry for help.

It seems that the new GMail compose interface overrides Firefox’s Ctrl-Shift-X shortcut, which switches the writing direction. It also overrides the right-click->Switch writing direction function; it simply doesn’t do anything.

I cannot do this in Google Chrome either, because of bug 91178 – There seems to be no way to set an input’s direction on Linux nor Chrome OS.

I can probably switch the direction by using rich text, but using rich text has its own issues, and I usually want to send my email in plain text.

Dear Google, please fix this. I tried the new compose interface several times and I complained about this problem in emails to my googler friends. Unfortunately this is still not fixed, and starting from today I can’t go back to the old compose interface.

I understand, of course, that GMail is a free service that doesn’t come with a warranty. Dear Google, I am asking you a favor. You did, in fact, contribute quite a lot to the development of support for right-to-left languages on the Web. I am only asking you to keep this support good.

Thank you.

P.S. Dear Google, please ask Google employees who speak right-to-left languages to use Google products in these languages, and to write email in these languages. Dog-fooding is the best testing. Thank you, again.

Look! I am Making All Things New

For the last couple of years I’ve been helping my parents to learn to use computers. Mostly very common and well-known things: GMail, Picasa, seraching Google, reading news websites, talking on Skype, the Russian social network Odnoklassniki, and not much more than that.

One of the most curious things that I found in my experiences with them is that emails and popups about new features are completely unhelpful to them. They always call me when they get them and ask me what to do now. It is awkward, because basically the emails tell them what to do, but instead of reading them and learning, they are reading them aloud to me:

— “It says: ‘Now you can find your friends more easily by typing their names in the search box’—so what do I do now?”

— “I don’t know… When you want to find somebody, type their names in the search box maybe?”

I am not saying that my parents are stupid; they aren’t. I am saying that these emails are not helpful. They appear to arrive from the helpful people in Google or Odnoklassniki, but the fact is that every time it happens, my parents are confused.

This makes me wonder: Is the effectiveness of these emails and popups and callouts researched? What are they good for? I don’t find them useful, because I actually like to find out things by myself; that’s my idea of user-friendliness: if it’s not self-explanatory, it is not user-friendly. My parents don’t find them useful, because they ask me what do the have to do. So is it useful for anybody?


PS 1: I know that Odnoklassniki is awful. They insisted.

PS 2: I know that Skype is not Free Software and that it doesn’t respect people’s privacy. Give me something properly Free that actually works. For what it’s worth, I did teach both of my parents to use Firefox and they hate other browsers, and on my mother’s laptop I installed Fedora, so except Skype, her online experience is almost completely Free.

A Relevant Tower of Babel

The Tower of Babel is frequently used as a symbol of foreign languages. For example, several language software packages are named after it, such as the Babylon electronic dictionary, MediaWiki’s Babel extension and the Babelfish translation service (itself named after the Babel fish from The Hitchhiker’s Guide).

In this post I shall use the Tower of Babel in a somewhat more relevant and specific way: It will speak about multilingualism and about Babel itself.

This is how most people saw the Wikipedia article about the Tower of Babel until today:

The Tower of Babel article. Notice the pointless squares in the Akkadian name. They are called "tofu" in the jargon on internationalization programmers.

The tower of Babel. Notice the pointless squares in the Akkadian name. They are called “tofu” in the jargon on internationalization programmers.

And this is how most people will see it from today:

And we have the name written in real Akkadian cuneiform!

And we have the name written in real Akkadian cuneiform!

Notice how the Akkadian name now appears as actual Akkadian cuneiform, and not as meaningless squares. Even if you, like most people, cannot actually read cuneiform, you probably understand that showing it this way is more correct, useful and educational.

This is possible thanks to the webfonts technology, which was enabled on the English Wikipedia today. It was already enabled in Wikipedias in some languages for many months, mostly in languages of India, which have severe problems with font support in the common operating systems, but now it’s available in the English Wikipedia, where it mostly serves to show parts of text that are written in exotic fonts.

The current iteration of the webfonts support in Wikipedia is part of a larger project: the Universal Language Selector (ULS). I am very proud to be one of its developers. My team in Wikimedia developed it over the last year or so, during which it underwent a rigorous process of design, testing with dozens of users from different countries, development, bug fixing and deployment. In addition to webfonts it provides an easy way to pick the user interface language, and to type in non-English languages (the latter feature is disabled by default in the English Wikipedia; to enable it, click the cog icon near “Languages” in the sidebar, then click “Input” and “Enable input tools”). In the future it will provide even more abilities, so stay tuned.

If you edit Wikipedia, or want to try editing it, one way in which you could help with the deployment of webfonts would be to make sure that all foreign strings in Wikipedia are marked with the appropriate HTML lang attribute; for example, that every Vietnamese string is marked as <span lang=”vi” dir=”ltr”>. This will help the software apply the webfonts correctly, and in the future it will also help spelling and hyphenation software, etc.

This wouldn’t be possible without the help of many, many people. The developers of Mozilla Firefox, Google Chrome, Safari, Microsoft Internet Explorer and Opera, who developed the support for webfonts in these browsers; The people in Wikimedia who designed and developed the ULS: Alolita Sharma, Arun Ganesh, Brandon Harris, Niklas Laxström, Pau Giner, Santhosh Thottingal and Siebrand Mazeland; The many volunteers who tested ULS and reported useful bugs; The people in Unicode, such as Michael Everson, who work hard to give a number to every letter in every imaginable alphabet and make massive online multilingualism possible; And last but not least, the talented and generous people who developed all those fonts for the different scripts and released them under Free licenses. I send you all my deep appreciation, as a developer and as a reader of Wikipedia.

Always define the language and the direction of your HTML documents, part 02: Backwards English

In part 01 of these series, I showed why is it important to always define the language and the direction of all HTML content and not rely on the defaults: The content may get embedded in a document with different direction and be displayed incorrectly.

This issue is laughably easy to avoid: If you are writing the content, you are supposed to know in what language it is written, so if it’s English, just write <html lang=”en” dir=”ltr”> even though these seem to be the defaults. Nineteen or so characters that ensure your content is readable and not displayed backwards. Please do it always and tell all your friends to do it.

The problem is that you don’t only have to explicitly set the language and the direction, but, as silly as it sounds, you have to set them correctly, too. A more subtle, but nevertheless quite frequent and disruptive bug is displaying presumably, but not actually, translated content in a different direction. This happens quite frequently when a website supports the browser language detection feature, known as Accept-Language:

  1. The web server sees that the browser requests content in Hebrew.
  2. The web server sends a response with <html lang=”he” dir=”rtl”>, but because the website is not actually translated, the text is shown in the fallback language, which is usually English.
  3. The user sees the content just like this numbered list, which I intentionally set to dir=”rtl”: with the numbers and the punctuation on the wrong side, and possibly invisible, because English is not a right-to-left language.

Of course, it can go even worse. Arrows can point the wrong way and buttons and images can overlap and hide each other, rendering the page not just hard to read, but totally unusable.

This bug is also an example of the Software Localization Paradox: It manifests itself when Accept-Language is not English, but most developers install English operating systems and don’t bother to change the preferred language settings in the browser, so they never see how this bug manifests itself. The site developers don’t bother to test for it either.

The solution, of course, is to set a different language and direction only if the site is actually translated, and not to pretend that it’s translated if it’s not.

Here are two examples of such brokenness. Both sites are important and useful, but hard to use for people whose Accept-Language is Hebrew, Persian or Arabic.

Here’s how the Mozilla Developer Network website looks in fake Hebrew:

Mozilla Developer Network website, in English, but right-to-left

Mozilla Developer Network website, in English, but right-to-left

Notice how the full stops are on the left end and how the text overlaps the images in the tiles on the right-hand side. This is how it is supposed to look, more or less:

Mozilla Developer Network home page in English, left-to-right

Mozilla Developer Network home page in English, left-to-right

I manually changed dir=”rtl” to dir=”ltr” using the element inspector from Firefox’s developer tools and I also had to tweak a CSS class to move the “mozilla” tab at the top.

The above troubles are reported as bug 816443 – lang and dir attributes must be used only if the page is actually translated.

After showing an example of a web development bug from a site for, ahem, web developers, here is an even funnier example: The home page of Unicode’s CLDR. That’s right: Unicode’s own website shows text with incorrect direction:

The Unicode CLDR website, in English but right-to-left

The Unicode CLDR website, in English but right-to-left

The only words translated here are “Contents” (תוכן) and “Search this site” (חיפוש באתר זה), which is not so useful. The rest is shown in English, and the direction is broken: Notice the strange alignment of the content and the schedule table. A few months ago that table was so broken that its content wasn’t visible at all, but that was probably patched.

Here’s how it is supposed to look:

The CLDR home page in English, appropriately left-to-right

The CLDR home page in English, appropriately left-to-right

I tried reporting the CLDR home page direction bug, but it was closed as “out-of-scope”: The CLDR developers say that the Google Sites infrastructure is to blame. This is frustrating, because as far as I know Google Sites doesn’t have a proper bug reporting system and all I can do is write a question about that direction problem in the Google Sites forum and hope that somebody notices it or poke my Googler friends.

One thing that I will not do is switch my Accept-Language to English. Whenever I can, I don’t just want to see the website correctly, but to try to help my neighbor: see the possible problems that can affect other users who use different language. Somebody has to break the Software Localization Paradox.

The Fateful March of 1998 – my #webstory

I first connected to the web in the summer of 1997. I bought a new computer with Windows 95 and Microsoft Internet Explorer 2. For about a week I thought that that’s how the web is supposed to look, but I kept seeing messages saying “Your browser doesn’t support frames” on a lot of sites. And then I found that there’s this thing called Microsoft Internet Explorer 3. I went to microsoft.com and downloaded it. It was the first piece of software that I downloaded. It was about 10 megabytes and took about an hour on my dial-up connection.

Most notably, Microsoft Internet Explorer 3 supported frames and animated GIFs. I loved animated GIFs! I guess that it makes me quite a hipster.

A cat in headphones dancing to house music.

House cat. Sorry, it’s an anachronism— this animated GIF is from mid-2000s. 1997’s animated GIFs were quite different.

And then Microsoft Internet Explorer 4 came out. I thought—”well, if the move from IE2 to IE3 made such a big difference, then I guess that I should try number 4, and it will be even cooler”. And I tried. And it was a disaster. The installation screwed up everything on my computer. I had no idea how to disable the dreaded Active Desktop, which it introduced. It didn’t work so well with my Hebrew version of Windows 95. So I did what a lot of people did very often back then and formatted my hard drive and re-installed Windows.

And the question arose—which browser should I use? IE3 was stable, but I didn’t like that it was getting old. So I went to netscape.com, to try that Netscape Navigator browser that I kept hearing everybody talking about it.

And I loved it.

I loved its nifty toolbars and its bookmarks manager. I loved the crash reporting; it crashed quite often, actually, but I didn’t feel so bad about it, because Microsoft’s programs crashed often, too, and in case of Netscape I felt good about reporting these crashes. Netscape’s email program, Netscape Messenger, was truly outstanding. I especially loved the green dot, which marked messages as read and unread in one click. Most of all, it said very clearly something that I came to realize only years later: “I am a program that lets you browse the web as well as possible. I am not trying to do anything else.”

Fast forward to March 1998. Netscape made the big announcement that the development of its browser becomes an open source project code-named “Mozilla”. I started hearing about “open source”, “free software” and Linux shortly before that, but it was mostly in the context of crazy geek hobbyists. And then suddenly a big famous end-user product that I love becomes open source—that felt really cool.

I followed Mozilla news since then. I heard about Bugzilla before its first version was released. I liked Mozilla’s decision to redo the whole rendering based on standards, even though many people criticized it. The thing that annoyed me the most in Mozilla’s early years was the lack of support for proper right-to-left text support, which was present in Internet Explorer. That’s why I, sadly, used mostly IE, and even became a bit of an IE power user. But I waited eagerly for Mozilla to do it and tried every alpha release.

"Are you fed up with your browser? You're not alone. We want you to know that there's an alternative... Firefox." The logo of Firefox is drawn with names of people.

The famous New York Times ad.

I was thrilled about the announcement of Firefox, the first stable version of Mozilla’s browser. I gave 10$ to the famous 2004 New York Times Firefox advertisement, and I still have the poster of that advertisement at home.

A long list of names, including Amir Elisha Aharoni

And there’s my name. Third line in the middle.

It always seemed natural to me that I follow Mozilla news so eagerly. I thought that everybody does it. I mean, how is it even possible to use the web in any way without being at least a bit curious about the technology that runs it?

And then in 2008 I wrote a little unimportant post in my Hebrew blog about a funny spelling correction. Tomer Cohen commented on it and suggested me to try the Hebrew spelling dictionary and Hebrew Firefox in general. And that’s how my big love story with software localization began.

I started sending corrections to the translation of Firefox’s interface translation. I started sending corrections to the Hebrew spelling dictionary. I got so curious about the way the spelling dictionary was built that I ended up doing a whole university degree in Hebrew Language. Really.

And in 2011 I started working in the Language Engineering team in the Wikimedia Foundation. I love it, and it probably wouldn’t have happened without my involvement with Mozilla. In the same year I also became a Mozilla Rep—a volunteer representative of Mozilla at conferences, blogs and forums.

Probably the most important thing that I learned from my Mozilla story is that loving the web and being curious about it is not something obvious. Most people just want something that works for checking weather, news, Facebook friends updates, homework help and kitten videos. And for the most part, that is perfectly fine. But the people’s freedom to read reliable and complete news on any electronic device cannot actually be taken for granted. Neither the people’s freedom and privacy to share their thoughts in social networks. Mozilla is among the most important organizations that care for these things and it develops technologies that make them possible. Technologies that let you browse the web as well as possible and don’t try to do anything else.

We do it for one simple reason: We love the web.

Do you love it, too?

P.S. As I began writing this post, I realized that Microsoft’s Active Desktop was not so different from today’s devices, which are heavily based on web technologies: Firefox OS, Chrome OS and others. I can’t say that I love Microsoft, but as it often happens, it was quite pioneering with ideas, and not so good with their execution. Credit where credit’s due.

Yakutsk 2012

When I was about five years old, I saw a map of the world on the wall of my Moscow home. I noticed that the USSR is very, very big. And that it has a lot of rivers, like Ob, Yenisey, and Lena. “Lena”, I thought, “How nice. Like a name of a girl.”

On the Lena river I saw a city called Yakutsk. The name sounded a bit funny to me, but I became curious about it somehow.

And last month I went there.


Yakutsk is the capital of the Sakha Republic, also known as Yakutia – the largest administrative region in the world that is not a country. The largest native ethnic group of Sakha, after which the republic is named, speak a Turkic language of the same name, although it is also frequently called “Yakut”. Even though I spent almost all of my Soviet life in Moscow, I was always very curious about all the other regions and languages of the USSR, so when I discovered Wikipedia, I devoted a lot of time to reading about them and to visiting Wikipedias in these languages, even though I cannot really read them.

A request to start a Wikipeda in Sakha was filed in 2006, and I was quick to support it. After a few months of preparations it was opened. It is now one of the relatively more active Wikipedias in languages of Russia – it has over 8,000 articles, and for a minority language, most speakers of which are bilingual in another major language, this is a good number.

I kept constant and positive contact with Nikolai Pavlov – the founder and the unofficial leader of the Sakha Wikipedia – since the very start of this Wikipedia. It was great to give these people technical and organizational advice: how to write articles effectively, how to choose topics, how to organize meet-ups of Wikipedians. For a long time I dreamt of meeting them in person, but because Yakutsk is so far away from practically any other imaginable place, I didn’t think that it will ever happen. But in April 2012 I met Nikolai at the Turkic Wikimedia Conference in Almaty, Kazakhstan.

A few days after that conference Nikolai suggested that I submit a talk for an IT conference in the North-Eastern Federal University in Yakutsk. At first I thought that I’m not really related to it, but after reading the description, I decided to give it a try and wrote a talk proposal about my favorite topics: MediaWiki and Software Localization. Somewhat surprisingly, the talks were accepted and I received an invitation to present at that conference.

With Nikolai Pavlov, also known as Halan Tul. The unofficial leader of the Sakha Wikipedia and the excellent organizer of my trip to Yakutsk.

With Nikolai Pavlov, also known as Halan Tul. The unofficial leader of the Sakha Wikipedia and the excellent organizer of my trip to Yakutsk.

I flew from Tel-Aviv to Moscow, and then six more hours from Moscow to Yakutsk. Yakutsk is apparently a modern, bustling and developed city, but with interesting twists. Most notably, because it is in the permafrost area, all the houses are built on piles and all the pipelines are above ground. But actually this is just a small detail, because the general feeling is that it was a whole different country from the European part of Russia, to which I was used, and in a very good way.

I am standing on a new bridge being built

I am standing on a new bridge being built

I was most pleasantly surprised by the liveliness of the Sakha language: practically all people there know Russian, but the Sakha speech is frequently heard on the streets, Sakha writing is frequently seen on advertising and store signs, and Sakha songs are played from many passing cars.

Myself standing in front of a classroom, speaking about MediaWiki

Speaking about MediaWiki in Yakutsk

The conference was very varied – with presenters from South Korea, China, Bulgaria, Switzerland and major Russian cities – Moscow, St. Petersburg and others. The topics were very varied, too, but the central topic was using computer technologies for education and human development, so I felt that my talks about Wikipedia and software localization were fitting.

I am standing holding a microphone in front of an audience in a university auditorium. Behind me - a screen with a GNU head, the logo of the Free Software Foundation.

Presenting my main plenary lecture about software localization. One of my main points is that using Free Software, represented by the GNU head, is very easy to internationalize.

Except participating in the conference itself, I also attended many meetings that Nikolai organized for me. It was fascinating to meet all these people.

Meeting the manager of Bichik, the national book publisher. On the wall - portraits of notable Sakha writers.

Meeting the manager of Bichik, the national book publisher. On the wall – portraits of notable Sakha writers.

I spoke to the editor and the manager of the republic’s largest book publishing company – they told me that the local literature has great artistic value, but since less than half a million people speak this language, it’s hard to earn a lot of profit from it and to develop it. They also complained that some authors – as well as some deceased authors’ families – are too harsh about copyrights. I suggested them to try to talk with authors and release some works under the Creative Commons license and see whether it gets them more exposure, and they promised to read Lawrence Lessig’s “Free Culture” book.

I am sitting in a classroom and speaking to a group of about ten people.

Meeting Yakutsk linguists and explaining them how putting their works on Wikipedia will make them much more accessible to the whole world.

I also met with linguists from the university, who work on researching and documenting the Sakha language and other languages of the region, such as Evenki and Yukagir. I suggested them to use Wikimedia resources for storage and documentation of the works they gather, and they liked the idea; I am definitely going to follow up with them on that.

In the offices of Ykt.ru, with the manager of the company - and a Kanban board in the background.

In the offices of Ykt.ru, with the manager of the company – and a Kanban board in the background.

Another great meeting I had was with local tech people – a community of proud local IT geeks, who had lots of ideas for promoting Wikipedias in regional languages, and also the management and the employees of the local Internet portal ykt.ru. Their offices look just like a building of a hi-tech company in the Silicon Valley or in Israel – with cozy rooms and lounges, and a Kanban board. The people made an excellent impression on me, too: we had a very professional and engaging conversation about developing web applications and agile management methodologies.

I am sitting on a couch and the TV crew prepare my microphone for the interview

Preparing for an interview at NVK, the national TV station

I also spoke to several journalists and to the local TV and radio stations, inviting people to read Wikipedia in their own language and to contribute to it. I felt a bit like a celebrity, and well, I hope that it made somebody realize how effective can the Internet be in promoting local cultures and how proud should people be about their own languages.

One last comment is about the Sakha literature, which I mentioned earlier. I return from almost all my trips abroad with a lot of books about the local languages and cultures. And I actually read them. It happened in this trip, too, except this time most of the books were given to me as gifts by all those very nice people that I met. Sakha prose and Olonkho poetry in translation to Russian are simply wonderful. In all honesty. This is beautiful world-class literature and it deserves more exposure. If this little blog post made you curious about it, then it’s the most important thing that it could achieve.

(All photos were taken by Nikolai Pavlov, except the one in which he appears.)



Follow

Get every new post delivered to your Inbox.

Join 1,704 other followers