Archive for the 'Google' Category

Weird GMail Habit: Removing Control Characters

GMail has a weirdish feature that probably very few people except me know about. When using it with a Hebrew user interface, invisible control characters—LRM, RLM, RLE, LRE and the like—are added to some strings to make them appear correctly in a mixed-direction interface.

Most notably, they are added to email addresses. I sometimes want to copy these email addresses as text, and my mouse pointer picks the control characters as well. Of course, these control characters are by themselves invisible to humans, but very much visible to computers, and an email address with these characters is not correct, even if it appears to be the same to human eyes.

It already became a habit for me to carefully delete and manually restore the first and the last characters of an email address to make sure that the control characters are removed.

It would be better if GMail just used the <bdi> element or CSS bidi isolation. They are fairly well supported in modern browsers and provide better experience.

Advertisements

Serbian Spam

I always celebrate when I receive spam in a language in which I haven’t yet received spam. I just received spam in Serbian for the first time. It was in the Cyrillic alphabet; Serbian can also be written in Latin, and it is frequently done in Serbia, possibly even more frequently than in Cyrillic, even though the government prefers Cyrillic.

This makes me wonder: Is Serbian in Cyrillic popular and important enough for spamming in it, or did the silly spammer just use Google Translate to translate to Serbian and got the result in Cyrillic, because that’s what Google Translate does?

If you know Serbian, can you please tell me whether it looks real or machine-translated? Words like “5иеарс” and the spaces before the punctuation marks give me a strong suspicion that it’s machine translation, but I might be wrong.

Молим вас за попустљивост за нежељене природи овог писма , али је рођена из очаја и тренутног развоја . Молимо носе са мном . Моје име је сер Алекс Бењамин Хубертревизор Африке развојне банке открио постојећи налог за успавану 5иеарс .

Када сам открио да није било ни наставак ни исплате са овог рачуна на овог дугог периода и наши банкарских закона предвиђа да ће било неупотребљивим чине више од 5иеарс иду на банковни прихода као неостварен фонда .

Ја сам се распитивала за личне депонента и његове најближе , али нажалост ,депонент и његове најближе преминуо на путу до Сенегала за тајкун , а он је оставио иза себе нема тело за ову тврдњу само сам направио ову истрагу само да буде двоструко сигурни у ту чињеницу , а пошто сам био неуспешан у лоцирању родбину .

So, how does it look? And do you receive Serbian spam? Thanks.

Broken right-to-left writing in the new GMail compose interface

Shalom.

Dear Google, this is a cry for help.

It seems that the new GMail compose interface overrides Firefox’s Ctrl-Shift-X shortcut, which switches the writing direction. It also overrides the right-click->Switch writing direction function; it simply doesn’t do anything.

I cannot do this in Google Chrome either, because of bug 91178 – There seems to be no way to set an input’s direction on Linux nor Chrome OS.

I can probably switch the direction by using rich text, but using rich text has its own issues, and I usually want to send my email in plain text.

Dear Google, please fix this. I tried the new compose interface several times and I complained about this problem in emails to my googler friends. Unfortunately this is still not fixed, and starting from today I can’t go back to the old compose interface.

I understand, of course, that GMail is a free service that doesn’t come with a warranty. Dear Google, I am asking you a favor. You did, in fact, contribute quite a lot to the development of support for right-to-left languages on the Web. I am only asking you to keep this support good.

Thank you.

P.S. Dear Google, please ask Google employees who speak right-to-left languages to use Google products in these languages, and to write email in these languages. Dog-fooding is the best testing. Thank you, again.

Look! I am Making All Things New

For the last couple of years I’ve been helping my parents to learn to use computers. Mostly very common and well-known things: GMail, Picasa, seraching Google, reading news websites, talking on Skype, the Russian social network Odnoklassniki, and not much more than that.

One of the most curious things that I found in my experiences with them is that emails and popups about new features are completely unhelpful to them. They always call me when they get them and ask me what to do now. It is awkward, because basically the emails tell them what to do, but instead of reading them and learning, they are reading them aloud to me:

— “It says: ‘Now you can find your friends more easily by typing their names in the search box’—so what do I do now?”

— “I don’t know… When you want to find somebody, type their names in the search box maybe?”

I am not saying that my parents are stupid; they aren’t. I am saying that these emails are not helpful. They appear to arrive from the helpful people in Google or Odnoklassniki, but the fact is that every time it happens, my parents are confused.

This makes me wonder: Is the effectiveness of these emails and popups and callouts researched? What are they good for? I don’t find them useful, because I actually like to find out things by myself; that’s my idea of user-friendliness: if it’s not self-explanatory, it is not user-friendly. My parents don’t find them useful, because they ask me what do the have to do. So is it useful for anybody?


PS 1: I know that Odnoklassniki is awful. They insisted.

PS 2: I know that Skype is not Free Software and that it doesn’t respect people’s privacy. Give me something properly Free that actually works. For what it’s worth, I did teach both of my parents to use Firefox and they hate other browsers, and on my mother’s laptop I installed Fedora, so except Skype, her online experience is almost completely Free.

A Relevant Tower of Babel

The Tower of Babel is frequently used as a symbol of foreign languages. For example, several language software packages are named after it, such as the Babylon electronic dictionary, MediaWiki’s Babel extension and the Babelfish translation service (itself named after the Babel fish from The Hitchhiker’s Guide).

In this post I shall use the Tower of Babel in a somewhat more relevant and specific way: It will speak about multilingualism and about Babel itself.

This is how most people saw the Wikipedia article about the Tower of Babel until today:

The Tower of Babel article. Notice the pointless squares in the Akkadian name. They are called "tofu" in the jargon on internationalization programmers.

The tower of Babel. Notice the pointless squares in the Akkadian name. They are called “tofu” in the jargon on internationalization programmers.

And this is how most people will see it from today:

And we have the name written in real Akkadian cuneiform!

And we have the name written in real Akkadian cuneiform!

Notice how the Akkadian name now appears as actual Akkadian cuneiform, and not as meaningless squares. Even if you, like most people, cannot actually read cuneiform, you probably understand that showing it this way is more correct, useful and educational.

This is possible thanks to the webfonts technology, which was enabled on the English Wikipedia today. It was already enabled in Wikipedias in some languages for many months, mostly in languages of India, which have severe problems with font support in the common operating systems, but now it’s available in the English Wikipedia, where it mostly serves to show parts of text that are written in exotic fonts.

The current iteration of the webfonts support in Wikipedia is part of a larger project: the Universal Language Selector (ULS). I am very proud to be one of its developers. My team in Wikimedia developed it over the last year or so, during which it underwent a rigorous process of design, testing with dozens of users from different countries, development, bug fixing and deployment. In addition to webfonts it provides an easy way to pick the user interface language, and to type in non-English languages (the latter feature is disabled by default in the English Wikipedia; to enable it, click the cog icon near “Languages” in the sidebar, then click “Input” and “Enable input tools”). In the future it will provide even more abilities, so stay tuned.

If you edit Wikipedia, or want to try editing it, one way in which you could help with the deployment of webfonts would be to make sure that all foreign strings in Wikipedia are marked with the appropriate HTML lang attribute; for example, that every Vietnamese string is marked as <span lang=”vi” dir=”ltr”>. This will help the software apply the webfonts correctly, and in the future it will also help spelling and hyphenation software, etc.

This wouldn’t be possible without the help of many, many people. The developers of Mozilla Firefox, Google Chrome, Safari, Microsoft Internet Explorer and Opera, who developed the support for webfonts in these browsers; The people in Wikimedia who designed and developed the ULS: Alolita Sharma, Arun Ganesh, Brandon Harris, Niklas Laxström, Pau Giner, Santhosh Thottingal and Siebrand Mazeland; The many volunteers who tested ULS and reported useful bugs; The people in Unicode, such as Michael Everson, who work hard to give a number to every letter in every imaginable alphabet and make massive online multilingualism possible; And last but not least, the talented and generous people who developed all those fonts for the different scripts and released them under Free licenses. I send you all my deep appreciation, as a developer and as a reader of Wikipedia.

The Case for Localizing Names

I often help my friends and family members open email accounts. Sometimes they are starting to use the Internet and sometimes they move from old email services (Yahoo, Walla!, ISP) to something modern (like it or not, Gmail).

At some point they have to fill their name, which will appear in the “from” field. And then I have to suggest them to write it in Latin characters, even though most of them speak languages that aren’t written in Latin characters – mostly Hebrew and Russian. Chances are that some day they will send an email to somebody who cannot read Russian or Hebrew, and Latin is relatively better known.

Only relatively, though. It may seem obvious to you that everybody knows the Latin script, but in fact, a lot of people are not comfortable with it at all. There are also other complications: lossy and inconsistent transliteration rules (is Amir אמיר or עמיר?), potential right-to-left rendering problems, and more. And of course, all people are happy to see their name in their language.

And people are also happy to see their friends’ names in their own language and not in a foreign or a neutral language. I have, for example, a lot of friends in India. Most of them write their names in English, but some write it in Marathi or in Malayalam. It’s certainly good for them, but in practice it’s much harder for me to find them this way, so English would be better – but Hebrew or Russian would be better yet.

Finally, there are a lot of people in the world who have more than one linguistic background. Mine are Russian, Hebrew and English, and I am really not such a special case. There are many millions of immigrants who have mixed backgrounds: Punjabi-Hindi-Urdu-English, Kurdish-Turkish-German, Kazakh-Russian-Norwegian, and others, and others and others. From each of these backgrounds they have friends, co-workers and family members, with whom they would love to communicate in the respective language. In each of these backgrounds they have friends who would want to find them using the name under which they know them there and using the appropriate language and writing system.

And sometimes people change their names, too. I did it (twice!), and so have many other people.

All this means that people’s names should be translatable, just like books, articles and software interfaces. Facebook and Google+ allow me to add a very limited number of names in foreign languages. Why wouldn’t they let me write my name in four, five, ten languages? This would make it easier for people who speak these languages to find me and to communicate with me. I would go even further and allow people who speak languages that I don’t know well to write my name as their hear it in their language and to add it to my details. Yet again, this would make me easier to find to even more people.

Some degree of automation can be possible. A lot of names are, after all, repetitive, so social networks would be able to suggest people with common names how their name would be written in other languages.

Wikipedia is actually quite good in this regard: Usually people have the same username across projects, and this username is not necessarily written in Latin letters, but people can customize the appearance of their signature in each project. I did it in a few languages, and people who speak those languages appreciate it.

I can only hope that social networks and email systems will allow as much flexibility as possible with this.

What do the people want? Part 2: Machine translation in their language – Google or Apertium

Another technical issue that bothered many people in the Turkic Wikimedia Conference in Almaty is support for their language in Google Translate. Though this is not directly related to Wikimedia, I was asked about this repeatedly by the participants, as well as by local journalists who interviewed me. Some people even referred to it as a “conspiracy”.

X

Tilek Mamutov, giving a talk about Google Translate

Tilek Mamutov, giving a talk about Google Translate

Luckily, one of the participants was Tilek Mamutov, a Google employee from Kyrgyzstan, and he delivered a whole talk about it. His main message was that there is no conspiracy, and that to support more languages Google mostly needs to process as many texts as possible in that language, if possible – with a parallel translation. There are much less digital texts in languages like Kyrgyz and Bashkir than there are in German and Spanish, so it is not yet possible.

However, there is hope: a group of volunteers in Kyrgyzstan is working on creating a database of digital translated texts with the specific goal of making it usable in Google Translate. WikiBilim, the Kazakh association that organized the conference works on a similar initiative, too.

On my behalf, I suggested a convenient way to gather texts in these languages: to upload literature in them to Wikisource. I also mentioned the existence of Apertium. Apertium is a Free machine translation engine, which can be adapted to any language. It was developed in Valencia, and the first languages that it started to support are languages that are relevant for Spain: Spanish, Catalan, Basque, English and also the closely-related Esperanto, and it translates between them quite well. It supports a few other languages, too.

And it can support even more languages. Like Google Translate, it also needs as many digital texts as possible to actually start working, and it also It needs dictionaries and tables of grammar rules, because it tries several methodologies for translation. Work has already begun for Turkish-Azeri and Turkish-Kyrgyz, and there are projects for Turkish-Chuvash and other language pairs. All these projects need people who can test them, contribute words to the dictionaries and check the grammar rules. So if you want to help complete a Free Turkish-Azeri machine translation system or to create an English-Kyrgyz translation system, contact the Apertium project.

To be continued…


Oh (edit): A correction came from Apertium developers: Apertium *doesn’t* need any texts, except for testing purposes. The more texts we have, the more we can test, of course, but above all, we need native speakers of languages who understand the grammar of the languages they’re working on and can work with computational formalisms.


Archives