Archive for the 'Google' Category

The Case for Localizing Names

I often help my friends and family members open email accounts. Sometimes they are starting to use the Internet and sometimes they move from old email services (Yahoo, Walla!, ISP) to something modern (like it or not, GMail).

At some point they have to fill their name, which will appear in the “from” field. And then I have to suggest them to write it in Latin characters, even though most of them speak languages that aren’t written in Latin characters – mostly Hebrew and Russian. Chances are that some day they will send an email to somebody who cannot read Russian or Hebrew, and Latin is relatively better known.

Only relatively, though. It may seem obvious to you that everybody knows the Latin script, but in fact, a lot of people are not comfortable with it at all. There are also other complications: lossy and inconsistent transliteration rules (is Amir אמיר or עמיר?), potential right-to-left rendering problems, and more. And of course, all people are happy to see their name in their language.

And people are also happy to see their friends’ names in their own language and not in a foreign or a neutral language. I have, for example, a lot of friends in India. Most of them write their names in English, but some write it in Marathi or in Malayalam. It’s certainly good for them, but in practice it’s much harder for me to find them this way, so English would be better – but Hebrew or Russian would be better yet.

Finally, there are a lot of people in the world who have more than one linguistic background. Mine are Russian, Hebrew and English, and I am really not such a special case. There are many millions of immigrants who have mixed backgrounds: Punjabi-Hindi-Urdu-English, Kurdish-Turkish-German, Kazakh-Russian-Norwegian, and others, and others and others. From each of these backgrounds they have friends, co-workers and family members, with whom they would love to communicate in the respective language. In each of these backgrounds they have friends who would want to find them using the name under which they know them there and using the appropriate language and writing system.

And sometimes people change their names, too. I did once, and so have many other people.

All this means that people’s names should be translatable, just like books, articles and software interfaces. Facebook and Google+ allow me to add a very limited number of names in foreign languages. Why wouldn’t they let me write my name in four, five, ten languages? This would make it easier for people who speak these languages to find me and to communicate with me. I would go even further and allow people who speak languages that I don’t know well to write my name as their hear it in their language and to add it to my details. Yet again, this would make me easier to find to even more people.

Some degree of automation can be possible. A lot of names are, after all, repetitive, so social networks would be able to suggest people with common names how their name would be written in other languages.

Wikipedia is actually quite good in this regard: Usually people have the same username across projects, and this username is not necessarily written in Latin letters, but people can customize the appearance of their signature in each project. I did it in a few languages, and people who speak those languages appreciate it.

I can only hope that social networks and email systems will allow as much flexibility as possible with this.

What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.

X

Tilek Mamutov, giving a talk about Google Translate

Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…


Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.

Mongol Bichig, or why Microsoft Internet Explorer is better than Firefox, Chrome and Opera

After writing this post I found out that Google Chrome, in fact, does support vertical Mongolian text.

The title of this post is designed to catch the eye. Microsoft Internet Explorer is not better than Firefox, Chrome and Opera – it’s worse than them in every imaginable regard.

Except one: the support for Mongol Bichig, the vertical Mongolian script.

Text in vertical Mongolian

Text in vertical Mongolian

Mongolian script is unique: its letters are connected, similarly to Arabic and its lines are written vertically. About three million Mongols in the independent republic of Mongolia use this script mostly for historical purposes, and use the Cyrillic script in their daily life, but the classical vertical script is the regular script for nearly six million Mongols in China – that’s about twice as much people.

The only browser that is able to display the vertical Mongolian script is Microsoft Internet Explorer. I don’t really know why Microsoft bothered to do it; maybe because the government of the People’s Republic of China demanded it. If that is true, then i salute the government of the People’s Republic of China. And i definitely salute Microsoft. I don’t like Microsoft’s insistence on keeping their code proprietary, but pioneering the support for this script, or any other, is praiseworthy.

I am very sad that at this time i cannot recommend my Mongolian friends to use my favorite browser, Firefox, or other modern browsers such as Google Chrome and Opera. For all their modernity, speed, feature richness and standards compliance, they are useless to over six million people who want to read and write in the vertical Mongolian script. At most, these browsers can display the script horizontally and with some letters incorrectly rendered. This also means that the only useful operating system for these people is Microsoft Windows.

One explanation that i heard for not supporting the vertical Mongolian script is that the CSS writing modes standard is not completely defined. This is actually a good and even noble reason, but when the most basic ability to read a language is in question, experimental support is better than no support.

So, which modern free browser will be the first to support the Mongolian script? I guess that it will be Firefox, given its excellent track record in supporting Unicode, and that Google Chrome will follow it after three years or so. But if Chrome developers surprise me and get there first, i’ll be just as happy. In any case, i am waiting impatiently, along with more than six million Mongols.

* * *


A completely unrelated postscript, intentionally hidden here, feel free to stop reading now: This morning i woke up to find that my Planet Mozilla feed was filled with reactions to a post by Gervaise Markham a.k.a. Gerv, in which he advocated keeping marriage defined as a union between a man and a woman, essentially opposing gay marriage. A lot of people were angry that anti-gay comments appear in a Mozilla-related feed and a lot of people were angry that anything off-topic appears there. Some people supported Gerv in different ways.

Gerv is a very well-known and very talented Mozilla programmer, and also a devout Christian. His blog is called “Hacking for Christ”. There’s nothing weird or wrong about it – there are many other excellent Christian hackers, like Perl’s Larry Wall and Jonathan Worthington and Mozilla’s Jonathan Kew. Gerv’s comment wasn’t particularly hateful; as it often goes, it focused on the legal side of things. Gerv is also an unusually charming person; i had the pleasure to meet him in Berlin.

All that said, i support gay marriage, i don’t support Gerv’s comment and i think that he shouldn’t have post it that way. But once he did, hey – water under the bridge. I care much more about his contributions to Mozilla’s code than about his social, legal and religious opinions.

And the loveliest part of it all is that in one the many comments to his post, i found a link to the play “8″, about the fight for recognizing gay marriage in California. On one hand, it’s a very well played PR stunt, with the highest league stars such as like Brad Pitt, George Clooney, Martin Sheen, Jamie Lee Curtis, Kevin Bacon, Yeardley Smith, John C. Reilly and George Takei. On the other hand, it’s actually worth watching. If this is what came out of that poorly placed blog post, then i’m not complaining.

Firefox Aurora – Mozilla’s biggest breakthrough since Firefox itself

This post encourages you to be a little more adventurous. Please try doing what it says, even if you don’t consider yourself a techie person.

The release of Firefox 4 in March 2011 brought many noticeable innovations in the browser itself, but there was another important innovation that was overlooked and misunderstood by many: A new procedure for testing and releasing new versions.

Before Firefox 4, the release schedule of the Firefox browser was inconsistent and versions were released “when they were ready”. Beta versions were released at rather random dates and quite frequently they were unstable. Nightly builds were appropriately called “Minefield” – they crashed so often that it was impossible to use them for daily web browsing activities.

The most significant breakthrough with regards to the testing of the Firefox browser came a year ago: Mozilla decided on a regular six-week release schedule and introduced the “release channels”: Nightly, Aurora, Beta and Release. The “Release” version is what most people download and use. “Beta” could be called a “Release candidate” – few, if any, changes are made to it before it becomes “Release”. Both “Aurora” and “Nightly” are updated daily and the differences between them are that “Nightly” has more experimental features that come right from the developers’ laptops and that “Aurora” is usually released with translations to all the languages that Firefox supports, while “Nightly” is mostly released in English.

Now here’s the most important part: I use Aurora and Nightly most of the time and my own experience is that both of them are actually very stable and can be used for daily browsing. It’s possible to install all the versions side-by-side on one machine and to have them use the same add-ons, preferences, history and bookmarks. This makes it possible for many testers to fully use them for whatever they need the browser for in their life without going back to the stable version. There certainly are surprises and bugs in functionality, but i have yet to encounter one that would make me give up. In comparison, in the old “Minefield” builds the browser would often crash before a tester would even notice these bugs, so it not so useful for testing.

This change is huge. Looking back at the year of this release schedule, this may be the biggest breakthrough in the world of web browsers since the release of Firefox 1.0 in 2004. In case you forgot, before Firefox was called “Firefox”, it was just “Mozilla”; it was innovative, but too experimental for the casual user: it had clunky user interface and it couldn’t open many websites, which were built with only Microsoft Internet Explorer in mind. Consequently, it was frequently laughed at. “Firefox” was an effort to take the great innovative thing that Mozilla was, clean it up and make it functional, shiny, inviting and easy to install and use. That effort was an earth-shaking success, that revived competition and innovation in Internet technologies.

Aurora does to software testing what Firefox did to web browsing. It makes beta testing easy and fun for many people – it turns testing from a bug hunting game that only nerds want to play into a fun and unobtrusive thing that anybody can do without even noticing. And it is a yet another thing that the Mozilla Foundation does to make the web better for everybody, with everybody’s participation.

A few words about Mozilla’s competitors: The Google Chrome team does something similar with what they call “Canary builds”. I use them to peek into the future of Chrome and i occasionally report bugs in them, but i find them much less stable than Firefox Nightly, so they aren’t as game-changing. Just as Minefield from Mozilla’s distant past, they crash too often to be useful as a daily web browser, so i keep going back to Firefox Aurora. Microsoft releases new versions of Microsoft Internet Explorer very rarely and installing future test versions is way too hard for most people, so it’s not even in the game. Opera is in the middle: It releases new versions of its browser quite frequently and offers beta builds for downloading, but it doesn’t have a public bug tracking system, so i cannot really participate in the development process.

To sum things up: Download Firefox Aurora and start using it as your daily browser and report bugs if you find any. You’ll see that it’s easier than you thought to make the Web better.

Keyboards, Firefox, Chrome and Privacy

I hardly ever used Google Chrome because of a bug that made the Ctrl-arrow keyboard shortcut work incorrectly in right-to-left languages. This shortcut works makes the cursor jump a word to the left or to the right. In Hebrew and Arabic it would jump to the left when the right arrow was be pressed. It works well in most other programs, but since Chrome doesn’t use the operating system’s text editing capabilities, this worked incorrectly.

I write a lot of email, blog posts and Wikipedia articles and this keyboard shortcut is essential for me, so if it doesn’t work correctly in a program, i simply cannot use it and will use the competitor, in my case Firefox. Since i love Firefox anyway, it was not really a problem for me.

It took more than two years to do it, but this bug is more or less solved now and the fix will probably be released soon. I am now trying a preliminary version and the Ctrl-arrow shortcut seems to work correctly. However, as i expected, i quickly found other problems because of which i cannot use Google Chrome. Long story short, i cannot write Russian there. It’s not that it’s impossible – it’s just way too hard for me.

I could enable the Russian keyboard layout in my operating system, but it would be very hard to use for me. Keyboards sold in my country usually come with Latin and Hebrew letters printed on the keys and not Russian. It’s possible to buy a keyboard with Russian letters on it, and i did it once, but it didn’t help me much. You see, i write Russian several times a day, but less often than i write Hebrew or English, and the Russian layout is very different from the Latin layout, so i type in it very slowly even if i have the letters in front of my eyes.

Since 2006 my solution for this issue was the Transliterator add-on for Firefox, created by Alex Benenson (thank you so much, Alex). It was first called “ToCyrillic”, because it only helped with the Cyrillic alphabet, but later it was adapted to many other languages. It allows me to type Russian phonetically, so the Latin ‘b’ is automatically converted to Cyrillic ‘б’, ‘sh’ becomes ‘ш’ etc. It works everywhere in Firefox – websites’ input fields, the address bar, the dialog windows etc.

I couldn’t find anything like it for Chrome. It’s possible that i didn’t look well enough, but the add-ons i did find that claimed to do transliteration, phonetic typing or keyboard emulation either did something completely different or asked me to allow the add-on access my data on all websites and my tabs and browsing activity. I don’t understand why such an add-on would need access to my data and browsing activity – it is only supposed to translate the characters i type into other characters and forget it.

It’s possible that the message that tells me about these privacy implications is over-zealous and the add-ons in question don’t actually breach my privacy, but it is still weird to see them, so i didn’t install them.

So there – i still have a strong reason not to move to Google Chrome. It’s not really Google’s fault. In fact, i could myself develop an extension that does something that i want – the source and the API are open and it’s probably not a lot of work. But why would i waste even a minute of my time doing such a thing if i already have Firefox and its Transliterator add-on that work perfectly well? You could say that Google Chrome is faster and uses less memory; it is not quite true in the first place, and even if it would be true, i wouldn’t care about it, because being able to write the language i want is far more important than minor differences in performance.


As a side note, in some Google websites it’s possible to type in transliteration. However, it works only on these particular sites and needs the machine to be online, because it uses a web service to translate every word. That is weird software design and has rather unacceptable privacy implications.

Wikipedia already has phonetic typing support in Malayalam, Tamil and other languages and soon it is going to be deployed to other languages. It works in-place – it translates the text immediately in the browser letter by letter. Of course, it only works in one website; it would be better to help people to enable their native keyboard layouts rather than do it in only one website, but apparently doing it this way helps people start writing and searching immediately. More details on that soon.

MozCamp Berlin 2011, part 1

On November 12–13 i participated in MozCamp Berlin. (I’m writing this late-ish, because a day after that i went to India to participate in a Wikimedia conference and not one, but two hackathons. That was a crazy month.)


In the past i participated in small events of the Israeli Mozilla community, but this was my first major Mozilla-centric event.

MozCamp Berlin 2011 group photo

MozCamp Berlin 2011 group photo. Notice the fox on the left and yours truly on the right.

The biggest thing that i take from this event is the understanding that i belong to this community of people who love the web. I never properly realized it earlier; i somehow thought that loving the web is a given. It is not.

Johnathan Nightingale, director of Firefox Engineering repeated the phrase “we <3 the web” several times in his keynote speech. And this is the thing that makes the Mozilla community special.

Firefox is not the only good web browser. Opera and Google Chrome are reasonably good, too. Frankly, they are even better than Firefox in some features, though i find them less essential.

Firefox is not the only web browser that strives to implement web standards. Opera, Google Chrome and even recent versions of Microsoft Internet Explorer try to do that, too.

Firefox is not even the only web browser that is Free Software. So is Chromium.

But Firefox and the Mozilla community around it love the web. I don’t really have a solid way to explain it – it’s mostly a feeling. And with other browsers i just don’t have it. They help people surf the web, but they aren’t in the business of loving it.

And this is important, because the Internet is not just a piece of technical infrastructure that helps people communicate, do business and find information and entertainment. The Internet is a culture in itself – worthy of appreciation in itself and worthy of love in itself – and the Mozilla community is there to make it happen.

Some people would understand from this that Firefox is for the nerds who care about the technology more than they care about going out every once in a while. It isn’t. It’s not, in fact, just about a browser. It’s about the web – more and more Mozilla is not just developing a great browser, but also technologies and trends that affect all users of all browsers, rather than target markets. By using Firefox you get as close as you can to the cutting edge, not just of cool new features, but of openness and equality. Some people may find this ideology boring and pointless; i find it important, because without it the Internet would not be where it is today. Imagine an Internet in which the main sites you visit every day are not Facebook, Wikipedia, Google and your favorite blogs, but msn.com… and nothing but msn.com. Without Mozilla that’s how the Internet would probably look today. Without Mozilla something like this may well happen in the future.


Thanks a lot to William Quiviger, Pierros Papadeas, Greg Jost and all the other hard-working people who produced this great event.

More about it in the next couple of posts very soon.

Type O Negative, part 2

Since my previous and very negative post about Google+ i played with it a little more. Apparently, a lot of my misunderstanding was related to actual bugs in its interface – for example, people that i’m not supposed to follow appear in my stream. I guess that it’s understandable, given that the service is so young.

I do have something very nice to say about it – it has an excellent interface for reporting bugs. You simply click the problematic area on the screen, write a description and submit the report. It is very buggy on Firefox, but i can understand that, too, hoping that they will fix it. It does work well in Google Chrome, but i can’t really use it, because Chrome’s right-to-left editing support is very bad. The sad thing is that after the report is submitted i don’t have a way to know what happens to it. Public bug tracking is one of the most common, most appealing, and most overlooked features of Free Software. However, reporting bugs in Free Software projects is a relatively hard process – the interface of bug tracking software such as Bugzilla is intimidating and lots of people don’t even know that they can use it.

I hope that Free Software web frameworks such as MediaWiki (Wikipedia’s engine), WordPress and Drupal, will adopt a similar model for reporting bugs and combine it with the already excellent concept of public bug tracking. If that would be Google+’s contribution to the web, it would be enough to say that it doesn’t suck.

Type O Negative

About two hundred people added me to their Google+ circles, whatever that means. I think that i blocked about sixty of them. Most of them had names i didn’t recognize, but there were also a few that i did. If you think that you are my friend and you don’t understand why did i block you, it doesn’t mean that i hate you – it just means that i don’t have a clue how Google+ works. Please feel free to re-add me if you want.

Until further notice, however, don’t expect me to actually use Google+. I don’t get this website. I’m probably too old. Besides, it has crappy support for writing from right to left.

I thank you in advance for not trying to convince me to change my mind about Google+ and for not trying to explain to me how it works. Google+ sucks.

Language teacher

If you search Google for “language teacher” (מורה ללשון) in Hebrew, the autocompletion suggests “language teacher killed herself” (מורה ללשון התאבדה). The word “teacher” is spelled the same for both genders, but the verb is feminine. I don’t know why does it happen, because actually searching for it doesn’t yield anything significant.

In Israeli schools where Hebrew is the medium of teaching, “Language” is the class where the grammar of Hebrew is taught… badly.

Roth

miriamruth11-hp

miriamruth11-hp; copyright: Google; based on the original illustration by Ora Ayal

Today the logo appearing at the top of Google.co.il honors Miriam Roth, the author of the famous Hebrew children’s book “A Tale of Five Balloons”. She was born on the 16th of February in 1910.

The Google employee who uploaded the image, made a mistake: the filename is “miriamruth”, but it should be “miriamroth”. That’s what happens when there’s no proper way to write the vowels: Her last name is written רות, which is how the Biblical name “Ruth”, still common in modern Israel, is written. But the German last name “Roth” is written the same way, because in Hebrew “u” and “o” are usually written using the same letter, Vav.

There is a way to differentiate the sounds: רוּת is “Ruth” and רוֹת is “Roth”. Notice the placement of the dot in relation to the letter in the middle. The sign for “u” is called shuruk, and the sign for “o” is called holam; i wrote the bulk of the articles about them in Wikipedia. Most people don’t type these signs; usually it’s fairly easy to guess the correct pronunciation, but people don’t use these signs even when it’s needed, as is the case with Ruth/Roth, because typing them on the standard Hebrew keyboard is very hard.

For years this made me very angry, so i asked the Standards Institute of Israel to develop a new standard keyboard in which it will be easy to type these signs. I was successful at convincing the SII to do it. The work is now underway, and i actively participate in the monthly meetings, together with representatives from Hamakor – the Israeli association for free and open source software, Israel Internet Association, IBM, Microsoft, Apple, Google and other companies. I hope that the standard will be published in 2011; the technical implementation of the keyboard layout will take about ten minutes on each operating system, and shortly after that, i hope, it will be distributed to computers using some kind of an auto-update mechanism.

And then, i hope, we’ll start to see at least slightly richer Hebrew typography everywhere. I want it to happen, not just because it’s a nice tradition, but because this will simply make Hebrew easier to read – and will prevent silly mistakes, like pronouncing and writing “Ruth” instead of “Roth”.


See also: Maqaf.



Follow

Get every new post delivered to your Inbox.

Join 1,393 other followers