MozCamp Berlin 2011, part 2

Except the general topic of Loving the Web, there was another important topic present in almost every time slot of MozCamp Berlin 2011, a topic that interest me more than anything else in software: localization. I attended most of the localization talks and gave one myself.

MozCamp Berlin 2011 WorldReady
MozCamp Berlin 2011 WorldReady

  • Vito Smolej from Slovenia gave two important talks about Translation Memory, especially in OmegaT. Translation Memory is barely used in Mozilla localization projects, even though it could make things much more efficient and Vito showed some ways in which it could be employed.
  • Jean-Bernard Marcon from France talked about the state of the BabelZilla site, which is used to translate Mozilla add-ons. Gladly, i didn’t have to tell him that despite the impressive amount of localizations that are done at that site, it is very problematic because of numerous technical issues – he said himself that he’s well aware of them and is going to replace the software completely Real Soon Now. I found it a little strange, however, that Jean-Bernard is happy about using the site for translating only Mozilla add-ons and doesn’t want to extend it to any other projects – say, Firefox itself. Oh well, as long as he maintains the add-ons site well, i’m happy.
  • Chris Hofmann and Jeff Beatty gave a great presentation about the present and the future of organizing localization groups and communicating about it. Frankly, it’s not all that i hoped to hear, but i’m really happy just to know that Mozilla, like Wikimedia, now has a guy whose job is to communicate about localization.

And i gave a talk that compares the localization of Mozilla and MediaWiki, the software behind Wikipedia. The slides are here. Many people who attended it said that it was bold of me to say these rather negative things about Mozilla. It is somewhat true – it is quite bold of me to use the first major Mozilla event i attend as a bully pulpit to promote my other project, but the talk was generally well-received. I believe that i succeeded at making my point: Both Mozilla and MediaWiki are leaders in the world of massively localized Free Software and both projects have things to learn from each other – Mozilla can simplify its translation workflow and consider converging its currently sprawling tools and procedures, as it is in MediaWiki, and MediaWiki can learn a lot from Mozilla about building the localization teams as communities of people and about quality control.

Finally, i was very glad to meet Dwayne Bailey and Alexandru Szasz – developers of Pootle and Narro, two localization tools used in the Mozilla world. Talking to them was very interesting and inspiring – they both understand well the importance of localization and the shortcomings of the current tools, including the ones that they are developing, and they are keen on fixing them. As a result of this excellent meeting i completed the translation of Pootle itself into Hebrew. And there is more to come.

MozCamp Berlin 2011, part 1

On November 12–13 i participated in MozCamp Berlin. (I’m writing this late-ish, because a day after that i went to India to participate in a Wikimedia conference and not one, but two hackathons. That was a crazy month.)


In the past i participated in small events of the Israeli Mozilla community, but this was my first major Mozilla-centric event.

MozCamp Berlin 2011 group photo
MozCamp Berlin 2011 group photo. Notice the fox on the left and yours truly on the right.

The biggest thing that i take from this event is the understanding that i belong to this community of people who love the web. I never properly realized it earlier; i somehow thought that loving the web is a given. It is not.

Johnathan Nightingale, director of Firefox Engineering repeated the phrase “we <3 the web” several times in his keynote speech. And this is the thing that makes the Mozilla community special.

Firefox is not the only good web browser. Opera and Google Chrome are reasonably good, too. Frankly, they are even better than Firefox in some features, though i find them less essential.

Firefox is not the only web browser that strives to implement web standards. Opera, Google Chrome and even recent versions of Microsoft Internet Explorer try to do that, too.

Firefox is not even the only web browser that is Free Software. So is Chromium.

But Firefox and the Mozilla community around it love the web. I don’t really have a solid way to explain it – it’s mostly a feeling. And with other browsers i just don’t have it. They help people surf the web, but they aren’t in the business of loving it.

And this is important, because the Internet is not just a piece of technical infrastructure that helps people communicate, do business and find information and entertainment. The Internet is a culture in itself – worthy of appreciation in itself and worthy of love in itself – and the Mozilla community is there to make it happen.

Some people would understand from this that Firefox is for the nerds who care about the technology more than they care about going out every once in a while. It isn’t. It’s not, in fact, just about a browser. It’s about the web – more and more Mozilla is not just developing a great browser, but also technologies and trends that affect all users of all browsers, rather than target markets. By using Firefox you get as close as you can to the cutting edge, not just of cool new features, but of openness and equality. Some people may find this ideology boring and pointless; i find it important, because without it the Internet would not be where it is today. Imagine an Internet in which the main sites you visit every day are not Facebook, Wikipedia, Google and your favorite blogs, but msn.com… and nothing but msn.com. Without Mozilla that’s how the Internet would probably look today. Without Mozilla something like this may well happen in the future.


Thanks a lot to William Quiviger, Pierros Papadeas, Greg Jost and all the other hard-working people who produced this great event.

More about it in the next couple of posts very soon.

The Software Localization Paradox

Wikimania in Haifa was great. Plenty of people wrote blog posts about it; the world doesn’t need a yet another post about how great it was.

What the world does need is more blog posts about the great ideas that grew in the little hallway conversations there. One of the things that i discussed with many people at Wikimania is what i call The Software Localization Paradox. That’s an idea that has been bothering me for about a year. I tried to look for other people who wrote about it online and couldn’t find anything.

Like any other translation, software localization is best done by people who know well both the original language in which the software interface was written – usually English, and the target language. People who don’t know English strongly prefer to use software in a language they know. If the software is not available in their language, they will either not use it at all or will have to memorize lots of otherwise meaningless English strings and locations of buttons. People who do know English often prefer to use software in English even if it is available in their native language. The two most frequent explanations for that is that the translation is bad and that people who want to use computers should learn English anyway. The problem is that for various reasons lots of people will never learn English even if it would be mandatory in schools and useful for business. They will have to suffer the bad translations and will have no way to fix it.

I’ve been talking to people at Wikimania about this, especially people from India. (I also spoke to people from Thailand, Russia, Greece and other countries, but Indians were the biggest group.) All of them knew English and at least one language of India. The larger group of Indian Wikipedians to whom i spoke preferred English for most communication, especially online, even if they had computers and mobile phones that supported Indian languages; some of them even preferred to speak English at home with their families. They also preferred reading and writing articles in the English Wikipedia. The second, smaller, group preferred the local language. Most of these people also happened to be working on localizing software, such as MediaWiki and Firefox.

So this is the paradox – to fix localization bugs, someone must notice them, and to notice them, more people who know English must use localized software, but people who know English rarely use localized software. That’s why lately i’ve been evangelizing about it. Even people who know English well should use software in their language – not to boost their national pride, but to help the people who speak that language and don’t know English. They should use the software especially if it’s translated badly, because they are the only ones who can report bugs in the translation or fix the bugs themselves.

(A side note: Needless to say, Free Software is much more convenient for localization, because proprietary software companies are usually too hard to even approach about this matter; they only pay translators if they have a reason to believe that it will increase sales. This is another often overlooked advantage of Free Software.)

I am glad to say that i convinced most people to whom i spoke about it at Wikimania to at least try to use Firefox in their native language and taught them where to report bugs about it. I also challenged them to write at least one article in the Wikipedia in their own language, such as Hindi, Telugu or Kannada – as useful as the English Wikipedia is to the world, Telugu Wikipedia is much more useful for people who speak Telugu, but no English. I already saw some results.

I am now looking for ideas and verifiable data to develop this concept further. What are the best strategies to convince people that they should use localized software? For example: How economically viable is software localization? What is cheaper for an education department of a country – to translate software for schools or to teach all the students English? Or: How does the absence of localized software affect different geographical areas in Africa, India, the Middle East?

Any ideas about this are very welcome.

Type O Negative, part 2

Since my previous and very negative post about Google+ i played with it a little more. Apparently, a lot of my misunderstanding was related to actual bugs in its interface – for example, people that i’m not supposed to follow appear in my stream. I guess that it’s understandable, given that the service is so young.

I do have something very nice to say about it – it has an excellent interface for reporting bugs. You simply click the problematic area on the screen, write a description and submit the report. It is very buggy on Firefox, but i can understand that, too, hoping that they will fix it. It does work well in Google Chrome, but i can’t really use it, because Chrome’s right-to-left editing support is very bad. The sad thing is that after the report is submitted i don’t have a way to know what happens to it. Public bug tracking is one of the most common, most appealing, and most overlooked features of Free Software. However, reporting bugs in Free Software projects is a relatively hard process – the interface of bug tracking software such as Bugzilla is intimidating and lots of people don’t even know that they can use it.

I hope that Free Software web frameworks such as MediaWiki (Wikipedia’s engine), WordPress and Drupal, will adopt a similar model for reporting bugs and combine it with the already excellent concept of public bug tracking. If that would be Google+’s contribution to the web, it would be enough to say that it doesn’t suck.

Palestinian geeks and RTL bugs

In the last few months i opened a bunch of MediaWiki bugs related to writing from right-to-left. If you click on the non-stricken-out numbers there, you’ll see my name at a few pages. Unfortunately i’m not yet much of a MediaWiki developer, but i’m quietly learning it at home.

This flood of right-to-left bugs was noticed. Mark Hershberger, Wikimedia’s bugmeister, wrote a blog post inviting developers who know RTL languages to fix the bugs. In the recent MediaWiki Hackathon 2011 in Berlin, which i attended as a member of the MediaWiki Language committee, i had the pleasure to meet Mark and many other MediaWiki developers in person – they taught me MediaWiki hacking tricks and i taught them the basics of RTL language handling in computers.

MediaWiki Hackathon 2011 participants, Berlin
MediaWiki Hackathon 2011 participants, Berlin. Photo: Tobias Schumann, CC-BY-SA-3.0-DE. Click to enlarge.

After the hackathon Mark’s blog post was made available for translation in translatewiki.net, the software localization hub for MediaWiki, Wikipedia-related projects and other Free Software. It makes sense to translate it, especially to RTL languages. I translated it to Hebrew. It was also translated to Macedonian and Bulgarian; to Bosnian and two types of Serbian; to French, Danish and German; to Latin, Albanian, Dutch, Chinese and Japanese.

Do you notice any right-to-left languages except Hebrew here? No, me neither. After i poked a few people, parts of it were translated to Persian, Urdu and Khowar, a language of Pakistan. And not a single line of it was translated into Arabic yet.

And i just don’t get it. It is a fact that there are Arab Free Software hackers on both sides of Jordan, as well as in Egypt, Saudi Arabia, Syria and other countries. Judging by the tweets with the #palgeeks hashtag in Twitter, there are more startups in Ramallah than in Herzliya. There are Arab Wikipedia editors in Israel and the West Bank, not to mention the rest of the Arab world. There are a lot of translations of software messages into Arabic in the same website, translatewiki.net. But not of this blog post, which could bring more fixes to RTL bugs, which would in turn benefit all the people writing and reading in the Arabic alphabet – that’s hundreds of millions of people.

You could say: Why bother translating it from English into Arabic? After all, someone who has the skill to fix bugs in PHP code, probably knows English. But the fact is that translating it into Hebrew was worth the few minutes i put into it, because it caused the Israeli MediaWiki developer Rotem Liss to fix one RTL bug. (Thank you, Rotem.) Just think what it may do if it is translated to Arabic, which is spoken by many, many more people.

So, dear #palgeeks and Arabic-speaking geeks in other countries! If any of you are reading this, please invest a few minutes to do the following:

  1. Go to translatewiki.net.
  2. If you don’t have an account: Create one by clicking “ادخل / أنشئ حسابا” or “Log in / create account” at the top. Then follow the instructions on the screen to request Translator permission.
  3. Go to Mark Hershberger’s post translation page.
  4. Start translating into Arabic.
  5. Copy the result to your own blog, publish it on Twitter, invite other Arab hackers to fix RTL bugs in MediaWiki.

Oh, and you are also cordially invited to Wikimania in Haifa and to the Hackathon that will take place for two days before it, starting on the 2nd of August. It’s not about politics; it’s about improving Wikipedia’s support for your language. And you’ll also get to meet Wikipedians from all around the world, which is even more fun in real life than it sounds. Really. (If you need assistance with getting into Israel, please contact me privately.)

Firefox and its memory problem

A Slashdot story says: “If you’re like a lot of Firefox 4 users out there, you’ve probably noticed that Firefox has a serious memory problem — it uses more than it really should.”

No, i didn’t. I am what people would call a “power user” of web browsers, and i didn’t notice any memory problems in Firefox. At least not any memory problems that caused any other problems that i would notice. I have no reason to measure the memory usage of an application if it doesn’t have any other problems. Let it use whatever it wants as long as it functions properly otherwise.

And, thank God, there are a lot of Firefox users who are much less advanced than i am, and they certainly don’t give a damn about memory usage.

So no, this claim about “a lot of Firefox users” noticing serious memory problems is just plain wrong.


(Ahem, yes, i still read Slashdot.)

Megabytes

Today two programs asked me upgrade themselves: Nokia PC Suite asked me to upgrade itself to Nokia OVI, which is supposed to be a newer program that does the same things and Samsung KIES asked me to upgrade itself to a new version.

Both programs are used to connect the computer to the mobile phone of the respective manufacturer to copy images and music files, install games, synchronize contacts, update firmware and do some other things. Synchronizing contacts is useful every once in a while and so is updating firmware; the other things are either useless or can be done using the regular file manager.

Nevertheless, the Nokia program had to download 90 megabytes and the Samsung program had to download 75 megabytes. That’s a lot. That’s too much. And that’s for two programs that essentially do the same things, most of which are not even very useful.

Mobile phones suck.

Translating Wikipedia Interface Into Amharic

There is a Wikipedia in the Amharic language, but it is developing slowly. One of the reasons for this is that the interface of MediaWiki, the software that is running the Wikipedia website, is translated into Amharic only partially, so people who don’t know Amharic can hardly use the website. Completing the translation of the interface will make the Amharic Wikipedia much more accessible to people who don’t know English. This is relevant not only to people who read and write Wikipedia online, but also to those who don’t have Internet access, because the Wikimedia Foundation and other organizations distribute offline copies of Wikipedia on CD-ROMs, printed books and other media.

Translation of Wikipedia’s interface is done by volunteers at the website translatewiki.net. I know this website well and i am willing to invest my time and teach any Amharic speaker who can translate software messages from English or from Hebrew. Practically no experience is needed – anyone who can use a web browser, can do this, too, and i shall provide all the needed support, anywhere in Israel. Do you know anyone who would be able to do this? This can be a great chance to improve one’s skills in computer use, in Amharic and in English and to help millions of Amharic speakers get access to one of the most important educational websites on the web.

If you know anyone who can help with it, please let me know.

Git glossary lacunae, part 1

If you work with linguistics, philology, texts or editing, you probably know what a “lacuna” is. If you don’t, then any dictionary will tell you that a lacuna is something that is supposed to be somewhere, but is missing, and it is usually said of texts in which words or whole passages are missing for some reasons.

I already wrote here about how much i hate the source code management system called Git (Git sucks 1, Git sucks 2). Actually, Git itself is probably a good piece of software, but learning it is terribly hard. I’ve been trying to do it for years and i still don’t understand almost anything. Learning Git is hard because every piece of documentation that discusses it is full of cryptic jargon. The solution to this problem is supposed to be in the man page called gitglossary, but it is very incomplete; in philologists’ jargon, it has lacunae.

I compiled a list of Git terms which i found hard to understand and which i could not find in gitglossary. At some point i thought that i would try to understand what these weird words mean myself and send patches with definitions to the maintainers of that file. Unfortunately, i am too busy to do that. The least i can do is to post that list here. If you are a Git expert, consider writing definitions for them and sending them as a patch to Git’s maintainers.

  • add
  • author
  • bisect
  • clone
  • committer
  • diff
  • grep
  • log
  • packed ref
  • remote (as a noun)
  • repo (it means “repository”, of course, but the glossary should mention this abbreviation)
  • reset
  • staging, staging area (a synonym for “index”, if i understand correctly)
  • status
  • treeish
  • working copy (this one seems simple, but it’s not)

These words are defined in the glossary, but the definitions are unclear:

  • parent – I couldn’t understand a word of that definition.
  • reflog – The definition says that this thing “can tell you what the 3rd last revision in _this_ repository was”. It is unclear whether the number 3 hear is just an example or it always refers to the 3rd last revision.
  • checkout – This should be defined very clearly and carefully, because the usage of this term in Git is quite different from its usage in other version control system. The current definition is unclear and circular: a checkout is “the action of updating all or part of the working tree with a tree object”; to understand it one needs to know what a “working tree” is – and it is defined as “the tree of actual checked out files”.

So, i’m sincerely sorry for only bringing up the problem without providing a solution. I hope that it’s better than just doing nothing.

(By the way, i would gladly post it as a bug in Git’s online bug tracking database… except that for some strange reason last time i checked Git developers don’t have one.)

Abuse in the Git Community and in the Wikipedia Community

Continuing the theme of my previous post about Git. You don’t need to know much about software development to read it.


I have a love/hate relationship with Git.

Git made its first major appearance on the Free Software community scene in style with the famous Linus Torvalds’ talk, a large part of which consisted of verbal abuse against Git’s main competitor, Subversion (a.k.a SVN). Torvalds did it in a funny and cute way, so that was forgivable, and the serious technical part of his talk was very interesting and convincing, so i tried installing Git and using it. I didn’t quite understand what is it good for, though, and it wasn’t as popular then, so i forgot about it for some time.

A couple of years later i went to work in another company, which didn’t use any kind of source code management system at all. We would just email zipped source trees to each other. It was bogus, so i proposed installing some kind of an SCM and my boss just told me that i can suggest any product i like. Recalling the apparent coolness and sexiness of Git, i tried installing it locally and worked with it by myself for a few days, and it was mostly OK.

After some time i wanted to create a branch to test some new feature i wanted to develop. Branching is touted as Git’s strongest selling point; everybody keeps saying that it’s very easy and robust in Git. So i tried creating a branch. I worked on the branch for a few days, switching between the branch and the master every few hours when i worked on different things. After a few days i realized that the source tree became completely screwed up because of that.

At that point i had to choose between searching for The Git Way of cleaning up the mess that was created by Git’s “robust” branching capabilities or to simply rewrite the screwed-up files by hand. Searching for The Git Way would take an unknown time of reading through cryptic documentation; rewriting the files would take a known time of some boring repetitive work. I chose the latter option and after i finished the manual recovery, i recommended my boss to install SVN.


Fast forward to 2010. GitHub, a website for gratis hosting of Git repositories of Free Software projects became very popular. I don’t quite understand why, given Git’s immense complexity, but that’s the fact. I want every now and then to send a patch or a Hebrew translation to projects hosted in GitHub, and every time i have to suffer through Git’s cryptic commands to do it. It’s just never quite the same; every time there’s a different problem. Git’s error messages make me feel like i’m punished for not knowing Git.

In one interesting project, for example, i got the source (“clone” in Git terms, “checkout” elsewhere) and i read it. After a couple of days i wanted to start writing a patch and i wanted to update my local copy before that, so i ran “git pull”, which is the command that is supposed to do the update. I started receiving messages about conflicts between the updated files and my local changes; the trouble is that i hadn’t made any local changes. After fighting with Git to resolve the conflicts for about ten minutes i gave up on that project. That’s just one example out of many.


I still didn’t want to give up on Git completely, though. Mostly because i felt that i will want to contribute to projects on GitHub every now and then. So i tried to read Git’s documentation yet again and found gitglossary. You know, i’m a linguist, i love dictionaries and glossaries, and this glossary is actually pretty good, because it really helps understanding the other parts of the documentation. There are a few missing words there, however, so i decided to contribute the missing definitions by finding them elsewhere and sending patches.

During the course of my searches i came upon The Git Wiki. Usually when i arrive at a wiki i open an account there; i have accounts on dozens of wikis, not counting the different languages of Wikipedia. The first thing i usually do after i open an account is to create a user page, which is usually called “profile” on other sites, and put a link to my blog on it, because that’s the easiest way to tell the world who i am.

When i did that on the Git Wiki, an administrator of that wiki deleted my user page, saying that it was “link spam”. That never happened to me on other wikis, so i sent him a message through his user page, which is the usual way of communicating between users of wikis, and i asked him what was so bad about what i had done. He deleted that message and blocked my account, saying that i was abusing the user talk page for messaging and abusing him as an administrator. I didn’t want to give up and i sent him an email, trying to explain my intentions. He replied by saying that it was impolite on my behalf not to say “Hi” in the beginning of my email.

Maybe Linus Torvalds gave a bad example after all.


Git is the single most frustrating piece of software i ever encountered and dedicated Git users are the most arrogant and patronizing bunch of people in the Free Software world. Not all of them, obviously; Jamie, who patiently replied to my rants the other day is nice. But except him nearly all of the Git people with whom i had contact were extremely unwelcoming, as if they were trying to protect their elite community from dumb outsiders. Indeed, getting into it is so hard that maybe those who succeeded to penetrate it became hardened by the struggle. Many, many times i just wanted to give up and decide to stop using it for the sake of my mental health. Yet Git has this strange sexiness that keeps attracting me and i don’t have an explanation for it. A kind of masochism, i guess.


This post is filed under “Wikipedia”, because the accusations similar to the ones i am making here against Git and the community around it are frequently made against Wikipedia. For years hard-core Wikipedians, including myself, lived in denial, saying that it’s not so hard to join Wikipedia, write articles and become part of its community, but many people kept complaining that Wikipedia is very hard and unwelcoming, both technically and socially.

This wall of denial is starting to crumble. The Hebrew Wikipedia community tried to deal with this problem recently by inviting a group of academics to a meeting where prominent community members tried to explain to academics in simple terms what Wikipedia is, what is good and what is bad about it, why Wikipedia wants academics to join it, and what are the easiest ways for academics to join. I am also trying to draft people to Wikipedia by proposing them to just send their contributions by email, thus passing by the technical hurdles that the Wikipedia user interface poses.

These steps certainly do not solve all the problems, but at least we acknowledge that problems exist. Does the Git community acknowledge that it has such problems? I doubt it, but i’ll be very glad to be proven wrong.