People who translate MediaWiki and other pieces of software with which I am involved to languages of India, Philippines, African countries, and other places, often ask me: Should we translate to pure native neologisms, or reuse English words?
Linguistic purism is quite common in translation in general, and not only in software. I heard of a study that compared a corpus of texts written originally in Hebrew with a corpus of translated texts. The translated texts had considerably fewer loanwords. This may be surprising at first: how can translated texts have fewer loanwords?
But actually this makes sense: A translator is not creating a story, but re-telling a story that already exists. A good translator is supposed to understand the original story well first, and after understanding it, the translator is supposed to retell it in the target language. Many translators use the time that they don’t invest in creating the story itself to make the translation “purer” than the target language actually is.
A text that is originally written in Hebrew expresses how Hebrew-speaking people actually talk. Despite a history of creating many neologisms, some of which were successful, Hebrew speakers also use a lot of loanwords from English, Arabic, German, Russian, French, and other languages.
And that’s totally OK. Loanwords don’t “degrade” or “kill” a language, as some people say. Certainly not by themselves. English has many, many words from French, Latin, Norwegian, Greek, and other languages, and they didn’t kill it. Quite the contrary: English is one of the most successful languages in the world.
A good original writer creates verisimilitude, naturally or intentionally, by using actual living language. And actual living language has loan words. More in some languages, fewer in others, but it happens everywhere.
Software localization is a bit different from books, however. Books, both fiction and non-fiction, are art. At least to some degree, the language itself is an expressive tool there.
Software user interfaces are less about art and more about function. A piece of software is supposed to be usable and functional, and as easy and obvious to learn and use as possible. The less people need to learn it, the closer it is to perfection. And localized software is no different: it must, above all, be functional.
Everything else is secondary to usability. If the translation is beautiful, but the software can’t be used, the job is not done.
And this is the thing that is supposed to guide you when choosing whether to translate a term as a native word, possibly a neologism, or to transliterate a foreign term and make it a loanword: Will the user understand it and be able to use the software?
The choice is not as obvious as some people may think, however. Some people may think that loaning a word makes more sense because it’s already familiar, and this will make the software familiar.
But always ask yourself: Familiar to whom?
The translator, pretty much by definition, is supposed to know the source language, and to be able to use the software in the source language. Most often the source language is English. So most likely the translator is familiar with the source terminology.
But will the user be familiar with it?
The translated piece of software is supposed to be usable by people who aren’t familiar with that software in the source language, and, very importantly, by people who don’t know the source language at all.
So if you translate words like “log in”, “account”, “file”, “proxy”, “feed”, and so on, by simply transliterating them into the target language because they are “familiar” in this form, ask yourself: are they also familiar to people who don’t know English and who aren’t experienced with using computers?
Some Hebrew software localizers translate “proxy” as something like “intermediary server” (שרת מתווך), and some just write “proxy” in transliteration (פרוקסי). The rationale for “proxy” is usually this: “everyone knows what ‘proxy’ is, and no one knows what an intermediary server is”.
But is it really everyone? Or maybe it’s just you and your software developer friends?
To people who aren’t software developers, the function of “proxy” is pretty much as obscure as the function of “intermediary server”… or is it? Because the fully translated native term actually says something about what this thing does in familiar words.
Of course, if you are really sure that a foreign term is widely familiar to all people, then it’s OK to use, and often it’s better than using a “pure” neologism.
And that’s why I put “pure” in double quotes: The “purity” itself is not important. Functionality and usability are above all. Sometimes “purity” makes usability better. Sometimes it doesn’t. It’s not an exact science.
I’ll go even further: More often than many people would think, pondering the meaning and choosing a correct translation for a software user interface term may start fruitful conversations about the design of the original software and uncover usability flaws that affect everyone, including the people who use the software in English.
There are thousands of useful bug reports in MediaWiki and in other software projects in which I am involved that were filed by localizers who had translation difficulties. Many of these bugs were fixed, improving the experience for users in English and in all languages.
To sum up:
Purism shouldn’t be the most important goal, but it should be one of the optional tools that the translator uses.
Purism is neither absolutely bad nor absolutely good. It can be useful when used wisely in context, case by case.
Usability should be the most important goal of software localization.
Usability means usability for all, not just for your colleagues.
Localizers can improve the whole software in more ways than just translating strings.
As you probably already know, Wikipedia is a website. A website has two components: the content and the user interface. The content of Wikipedia is the articles, as well as various discussion and help pages. The user interface is the menus around the articles and the various screens that let editors edit the articles and communicate to each other.
Another thing that you probably already know is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.
Translation of articles is a topic for another post. This post is about getting all the user interface translated to your language, and doing it as quickly, easily, and efficiently as possible.
The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are more than 3,800 messages to translate in MediaWiki, and the number grows frequently. “Messages” in the MediaWiki jargon are strings that are shown in the user interface. Every message can and should be translated.
In addition to core MediaWiki, Wikipedia also uses many MediaWiki extensions. Some of them are very important because they are frequently seen by a lot of readers and editors. For example, these are extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are more than 5,000 messages to translate in the main extensions, and over 18,000 messages to translate if you want to have all the extensions translated, including the most technical ones. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.
Translating all of it probably sounds like an impossibly enormous job. It indeed takes time and effort, but the good news are that there are languages into which all of this was translated completely, and it can also be completely translated into yours. You can do it. In this post I’ll show you how.
A personal story
In early 2011 I completed the translation of all the messages that are needed for Wikipedia and projects related to it into Hebrew. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew. Since then, if you can read Hebrew, you don’t need to know a single English word to use it.
I didn’t do it alone, of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Dagesh Hazak, Guycn2 and Inkbug (I don’t know the real names of the last three), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.
However, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me only a few minutes to translate them and get back to 100%.
I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or because I am ill. It slipped for quite a few months in 2014 because my first child was born and a lot of new messages happened to be added at about the same time, but Hebrew got back to 100%. It happened again in 2018 for the same happy reason, and went back to 100% after a few months. And I keep doing this.
With the sincere hope that this will be useful for helping you translate the software that powers Wikipedia completely to your language, let me tell you how.
Preparation
First, let’s do some work to set you up.
If you haven’t already, create a translatewiki.net account at the translatewiki.net main page. First, select the languages you know by clicking the “Choose another language” button (if the language into which you want to translate doesn’t appear in the list, choose some other language you know, or contact me). After selecting your language, enter your account details. This account is separate from your Wikipedia account, so if you already have a Wikipedia account, you need to create a new one. It may be a good idea to give it the same username.
After creating the account you have to make several test translations to get full translator permissions. This may take a few hours. Everybody except vandals and spammers gets full translator permissions, so if for some reason you aren’t getting them or if it appears to take too much time, please contact me.
Make sure you know your ISO 639 language code. You can easily find it on Wikipedia.
Go to your preferences, to the Editing tab, and add languages that you know to Assistant languages. For example, if you speak one of the native languages of South America like Aymara (ay) or Quechua (qu), then you probably also know Spanish (es) or Portuguese (pt), and if you speak one of the languages of Indonesia like Javanese (jv) or Balinese (ban), then you probably also know Indonesian (id). When available, translations to these languages will be shown in addition to English.
The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. It hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects. There are plenty of extensions, with thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.
It would be nice to translate all of it, but because I don’t have time for that, I have to prioritize.
On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:
Core MediaWiki: the heart of it all
Extensions used by Wikimedia: the extensions on Wikipedia and related sites. This group is huge, and I prioritize it further; see below.
MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
Wikipedia Android app
Wikipedia iOS app
Installer: MediaWiki’s installer, not used on Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
Intuition: a set of tools, like edit counters, statistics collectors, etc.
Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.
I usually don’t work on translating other projects unless all the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important.
Priorities, part 2
So how can you know what is important among more than 18,000 messages from the Wikimedia universe?
Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.
Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).
Some technical notes
Have you read the general localization guide for Mediawiki? Read it again, and make sure you understand it. If you don’t, ask for help! The most important section, especially for new translators, is “Translation notes”.
A super-brief list of things that you should know:
Many messages use symbols such as ==, ===, [[]], {{}}, *, #, and so on. This is wiki syntax, also known as “wikitext” or “wiki markup”. It is recommended to become familiar with some wiki syntax by editing a few pages on another wiki site, such as Wikipedia, before translating MediaWiki messages at translatewiki.
“[[Special:Homepage]]” adds a link to the page “Special:Homepage”. “[[Special:Homepage|Homepage]]” adds a link to the page “Special:Homepage”, but it will be displayed as “Homepage”. In such cases, you are usually not supposed to translate the text before the | (pipe), but you should translate the text after it. For example, in Russian: “[[Special:Homepage|Домашняя страница]]”. When in doubt, check the documentation in the sidebar.
$1, $2, $3: These are known as parameters, placeholders, or variables. They are replaced in run time, usually by numbers of names. Copy them as they are, and put them in the right place in the sentence, where it is right for your language. Always check the documentation in the sidebar to understand with what will they be replaced.
If you see something like “$1 {{PLURAL:$1|page|pages}}” in a translatable message, this means that the word will be shown according to the value of the variable $1. Note that you must not change the “PLURAL:$1” part, but you must translate the “page|pages” part.
If you see something else in curly brackets, it’s probably a “magic word”. Check the documentation to understand it. You usually don’t translate the thing in the beginning, such as {{SITENAME, {{GENDER, etc., but you sometimes need to translate things towards the end. See the localization guide for full documentation!
Learn to use the project selector at the top of the translation interface. Projects are also known as “Message groups”. For example, each extension is a message group, and some larger extension, such as Visual Editor, are further divided into several smaller message groups. Using the selector is very simple: Just click “All” next to “Message group”, and use the search box to find the component that you want to translate, such as “Visual Editor” or “Wikibase”. Clicking on a message group will load the untranslated messages for that group.
The “Extensions used by Wikimedia” group is divided into several more subgroups. The important one is “Extensions used by Wikimedia – Main”, which includes the most commonly used extensions. Other subgroups are:
“Advanced”: extensions that are used only on some wikis, or are useful only to administrators and other advanced users. This should be the first subgroup you translate after you complete the “Main” subgroup.
“Fundraising”: extensions used for collecting donations for the Wikimedia Foundation.
“Legacy”: extensions that are still installed on Wikimedia sites, but are going to be removed. You can most likely skip this subgroup completely.
“Media” includes advanced tools for media files curating and uploading, especially on Wikimedia Commons.
“Technical”: this is mostly API documentation for various extensions, which is shown on the ApiHelp and ApiSandbox special pages. It is very useful for developers of gadgets, bots, and other software, but not necessary for other users. This group also includes several other very advanced extensions that are used only by a few people. You should translate these messages some day, but it’s OK to do it later.
“Upcoming”: these are extensions that are not yet widely installed on Wikimedia sites, but are going to be installed soon. Translating them is a pretty good idea, because they are usually very new, and may include some confusing messages. The earlier you report these confusing messages to the developers, the better!
“Wikivoyage”: extensions used only on Wikivoyage sites. Translate them if there is a Wikivoyage site in your language, or if you want to start one.
There is also a group called “EXIF Tags”. It’s an advanced part of core MediaWiki. It mostly includes advanced photography terminology, and it shows information about photographs on Wikimedia Commons. If you are not sure how to translate these messages, ask a professional photographer. In any case, it’s OK to do it later, after you completed more important components.
Getting things done, one by one
Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?
I have surprising advice.
You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the easiest piece that will get your language closer to 100%! For me this is an easy way to remove an item off my list and feel that I accomplished something.
But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical. The list is ordered not by importance, but by the number of messages to translate (as of October 2020):
Vector: the default skin for desktop and laptop computers
Minerva Neue: the skin for mobile phones and tablets
Babel: for displaying boxes on user pages with information about the languages that the user knows
Thanks: the extension for sending “thank you” messages to other editors
Universal Language Selector: the extension that lets people easily select the language they need from a long list of languages (disclaimer: I am one of its developers)
jquery.uls: an internal component of Universal Language Selector that has to be translated separately (for technical reasons)
Cite: the extension that displays footnotes on Wikipedia
Math: the extension that displays math formulas in articles
Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource (this is relevant only if there is a Wikisource site in your language, or if you plan to start one)
I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and actually, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.
Getting all the things done
OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors. But let’s go further.
Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.
As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.
Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.
You’ll be able to congratulate yourself not only upon the big accomplishment of getting everything to 100%, but also upon the accomplishments along the way.
One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own language. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions” (this may take a few seconds—there are lots of them!), then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.)
Another fun strategy is quiet and friendly competition with other languages. Open the statistics for Extensions Used by Wikimedia – Main and sort the table by the “Completion” column. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.
For example, here’s an excerpt from the statistics for today:
Let’s say that you are translating to Georgian. You only need to translate 37 messages to pass Marathi and go up a notch (2555 – 2519 + 1 = 37). Then 56 messages more to pass Hindi and go up one more notch (2518 – 2463 + 1 = 56). And so on.
Once you’re done, you will have translated over 5600 messages, but it’s much easier to do it in small steps.
Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimedia. It’s way over 10,000 messages, but the same strategies work.
Good stuff to do along the way
Invite your friends! You don’t have to do it alone. Friends will help you work more quickly and find translations to difficult words.
Never assume that the English message is perfect. Never. Do what you can to improve the English messages. Developers are people just like you are. There are developers who know their code very well, but who are not the best writers. And though some messages are written by professional user experience designers, many are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Also, keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries. English is foreign to them, and they may make mistakes.
So if anything is hard to translate, of if there are any other problems with the English messages to the translatewiki Support page. While you are there, use the opportunity to help other translators who are asking questions there, if you can.
Another good thing is to do your best to try using the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.
Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!
Before translating a component, review the messages that were already translated. To do this, click the “All” tab at the top of the translation area. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.
After you gain some experience, create or improve a localization guide in your language. There are very few of them at the moment, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.
As in Wikipedia itself, Be Bold.
OK, so I got to 100%, what now?
Well done and congratulations.
Now check the statistics for your language every day. I can’t emphasize enough how important it is to do this every day. If not every day, then as frequently as you can.
The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there are just a few new messages to translate; I didn’t measure precisely, but usually it’s fewer than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.
But what if you suddenly see 200 new messages to translate or more? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed. Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.
But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.
Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. Up to you. The same techniques apply everywhere.
What do I get for doing all this work?
The knowledge that thanks to you, people who read in your language can use Wikipedia without having to learn English. Awesome, isn’t it? Some people call it “Good karma”. Also, the knowledge that you are responsible for creating and spreading the terminology in your language for one of the most important and popular websites in the world.
Oh, and you also get enormous experience with software localization, which is a rather useful and demanded job skill these days.
Is there any other way in which I can help?
Yes!
If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me and send me a link to your translation). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.
When we became parents in 2014, my wife searched for a mobile app to help manage baby care: feeding, sleep, diaper changing, and other activities.
After trying a few apps, she settled on Baby Daybook by Drilly Apps. It indeed turned out to be very convenient for parenting in the first few months, but here I wanted to praise its developers for doing localization right:
The app is translatable at OneSky. It’s one of many other localization sites existing today. It’s probably less famous than Crowdin and Transifex, but pretty fine functionally, and I have not particular complaints about it. (And of course, it’s a bit of a competitor of translatewiki.net, where I am one of the maintainers, but that’s OK—competition is healthy.)
Any volunteer can log in with a GitHub account and start translating to any language.
Translation review is available, but not required. Submitted translations go into the next released version whether reviewed or not. This is good, because bad translations are actually very rare, and many languages have very few dedicated translation maintainers, often just one, so demanding translation review is usually just harmful and unnecessary overhead.
The developers quickly answer my emails when I ask them for clarification about string meanings, and when I suggest changes in the English string. Recently I suggested changing “Thousands of happy moms use this app to track breastfeedings and sync data” to “Thousands of happy parents use this app to track breastfeedings and sync data”, and they immediately changed it.
App descriptions for the Google Play store are fully translatable. This may seem like an obvious thing to do, but almost all apps use a machine translation, which is almost always wrong, often embarrassingly so. I saw some very popular apps, used by millions of people, having Hebrew and Russian descriptions that are worse than useless. For a lot of apps it would be better to just leave the English description. Luckily, Baby Daybook does the right thing and allows translators write a complete description for every language.
Translators get a free pro account in the app, with extra features. (This worked for me the last time I checked, in 2014; I haven’t checked recently, but I have no reason to think that it changed.)
All of the above things are really easy and sensible, but for some reason most app developers don’t do this.
App developers, please learn from Drilly Apps how to do it right.
As you probably already know, Wikipedia is a website. A website has content—the articles; and it has user interface—the menus around the articles and the various screens that let editors edit the articles and communicate to each other.
Another thing that you probably already know is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.
Translation of articles is a topic for another post. This post is about getting all of the user interface translated to your language, as quickly and efficiently as possible.
The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are 3,335 messages to translate in MediaWiki, and the number grows frequently. “Messages” in the MediaWiki jargon are strings that are shown in the user interface, and that can be translated. In addition to core MediaWiki, Wikipedia also has dozens of MediaWiki extensions installed, some of them very important—extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are around 3,500 messages to translate in the main extensions, and over 10,000 messages to translate if you want to have all the extensions translated. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.
Translating all of it probably sounds like an enormous job, and yes, it takes time, but it’s doable.
In February 2011 or so—sorry, I don’t remember the exact date—I completed the translation into Hebrew of all of the messages that are needed for Wikipedia and projects related to it. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew, and if you were a Hebrew speaker, you didn’t need to know a single English word to use it.
I wasn’t the only one who did this of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Inkbug (whose real name I don’t know), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.
Of course, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me a few minutes to translate them and get back to 100%.
I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or I am ill. It slipped for quite a few months because in late 2014 I became a father, and a lot of new messages happened to be added at the same time, but Hebrew is back at 100% now. And I keep doing this.
With the sincere hope that this will be useful for translating the software behind Wikipedia to your language, let me tell you how.
Preparation
First, let’s do some work to set you up.
Get a translatewiki.net account if you haven’t already.
Make sure you know your language code.
Go to your preferences, to the Editing tab, and add languages that you know to Assistant languages. For example, if you speak one of the native languages of South America like Aymara (ay) or Quechua (qu), then you probably also know Spanish (es) or Portuguese (pt), and if you speak one of the languages of the former Soviet Union like Tatar (tt) or Azerbaijani (az), then you probably also know Russian (ru). When available, translations to these languages will be shown in addition to English.
The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. It hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects; there are plenty of extensions, with thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.
It would be nice to translate all of it, but because I don’t have time for that, I have to prioritize.
On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:
Core MediaWiki: the heart of it all
Extensions used by Wikimedia: the extensions on Wikipedia and related sites
MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
Wikipedia Android app
Wikipedia iOS app
Installer: MediaWiki’s installer, not used in Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
Intuition: a set of different tools, like edit counters, statistics collectors, etc.
Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.
I usually don’t work on translating other projects unless all of the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important.
Priorities, part 2
So how can you know what is important among more than 15,000 messages from the Wikimedia universe?
Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.
Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).
Getting Things Done, One by One
Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?
I have surprising advice.
You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the easiest piece that will get your language closer to 100%! For me this is an easy way to strike an item off my list and feel that I accomplished something.
But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical, sorted not by importance, but by the number of messages to translate:
Cite: the extension that displays footnotes on Wikipedia
Babel: the extension that displays boxes on userpages with information about the languages that the user knows
Math: the extension that displays math formulas in articles
Thanks: the extension for sending “thank you” messages to other editors
Universal Language Selector: the extension that lets people select the language they need from a long list of languages (disclaimer: I am one of its developers)
jquery.uls: an internal component of Universal Language Selector that has to be translated separately for technical reasons
Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
VisualEditor: the extension that allows Wikipedia articles to be edited in a WYSIWYG style
ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource
I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and to be honest, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.
Getting All Things Done
OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors.
But let’s go further.
Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.
As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.
Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.
You’ll be able to congratulate yourself not only upon the big accomplishment of getting everything to 100%, but also upon the accomplishments along the way.
One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own language. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions”, then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.) This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic.
Another strategy is quiet and friendly competition with other languages. Open the statistics for Extensions Used by Wikimedia – Main and sort the table by the “Completion” column. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.
For example, here’s an excerpt from the statistics for today:
Let’s say that you are translating to Malay. You only need to translate eight messages to go up a notch (901 – 894 + 1). Then six messages more to go up another notch (894 – 888). And so on.
Once you’re done, you will have translated over 3,400 messages, but it’s much easier to do it in small steps.
Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimedia. It’s over 10,000 messages, but the same strategies work.
Good Stuff to Do Along the Way
Never assume that the English message is perfect. Never. Do what you can to improve the English messages.
Developers are people just like you are. They may know their code very well, but they may not be the most brilliant writers. And though some messages are written by professional user experience designers, many are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries, and English is foreign to them, and they may make mistakes.
So report problems with the English messages to the translatewiki Support page. (Use the opportunity to help other translators who are asking questions there, if you can.)
Another good thing is to do your best to try running the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.
Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!
Before translating a component, review the messages that were already translated. To do this, click the “All” tab at the top of the translation area. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.
After you gain some experience, create a localization guide in your language. There are very few of them at the moment, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.
As in Wikipedia, Be Bold.
OK, So I Got to 100%, What Now?
Well done and congratulations.
Now check the statistics for your language every day. I can’t emphasize how important it is to do this every day.
The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there is just a small number of new messages to translate; I didn’t measure precisely, but usually it’s less than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.
But what if you suddenly see 200 new messages to translate? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed.
Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.
But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.
Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. Up to you. The same techniques apply everywhere.
What Do I Get for Doing All This Work?
The knowledge that thanks to you people who read in your language can use Wikipedia without having to learn English. Awesome, isn’t it? Some people call it “Good karma”.
Oh, and enormous experience with software localization, which is a rather useful job skill these days.
Is There Any Other Way in Which I Can Help?
Yes!
If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me and send me a link to your translation). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.
Versions of this post were already published in the following languages:
In the last few years, I usually have some files of Israeli music with me when I leave my home, or my country – on my laptop or on my phone (ripped from CDs that I own, which is legit as far as my interpretation of copyright law goes).
And sometimes people from other countries are curious about it and ask me to copy some files for them. This is a copyright issue, but I justify it by the fact that they hardly have a chance to purchase it where they live, so they aren’t really hurting the relevant market. But there’s something bigger: a technical issue with the artist and song names.
Hebrew is written in the Hebrew alphabet. CDs have artist names and song titles in Hebrew, with English translations or transliterations added only occasionally. When I rip CDs, I give the files names in Hebrew letters. Most people around the world don’t know the Hebrew alphabet, so looking for a song they like using these files will be impossible for them. They would only be able to enjoy them if they don’t mind listening to everything in a shuffle. And though the newest phones are able to display Hebrew correctly, some devices that people have are still unable to do that.
I actually recall myself renaming files en masse to let friends from other countries listen to some Israeli music and now the artists’ names.
I’m not sure how to resolve this robustly, but much like with email and social networks and with legal forms, songs could use titles in different languages or scripts. Maybe MusicBrainz or Wikidata could add a structured property for transliterated song titles, and music files could be identified like that. Maybe each music track could have multiple fields for titles in different languages.
It’s good not just for international exchange between friends, but for marketing, too – some cultures only listen to music in English and maybe in their own language, but some are OK with listening to music in a lot of languages, because they are all equally foreign.
Long story short, song names must be more easily localizable than they are today.
In November I gave a talk about how we do localization in Wikimedia at a localization meetup in Tel-Aviv, kindly organized by Eyal Mrejen from Wix.
I presented translatewiki.net and UniversalLanguageSelector. I quickly and quite casually said that when you submit a translation at translatewiki, the translation will be deployed to the live Wikipedia sites in your language within a day or two, after one of translatewiki.net staff members will synchronize the translations database with the MediaWiki source code repository and a scheduled job will copy the new translation to the live site.
Yesterday I attended another of those localization meetups, in which Wix developers themselves presented what they call “Continuous Translation”, similarly to “Continuous Integration“, a popular software deployment methodology. Without going into deep details, “Continuous Translation” as described by Wix is pretty much the same thing as what we have been doing in the Wikimedia world: Translators’ work is separated from coding; all languages are stored in the same way; the translations are validated, merged and deployed as quickly and as automatically as possible. That’s how we’ve been doing it since 2009 or so, without bothering to give this methodology a name.
So in my talk I mentioned it quickly and casually, and the Wix developers did most of their talk about it.
I guess that Wix are doing it because it’s good for their business. Wikimedia is also doing it because it’s good for our business, although our business is not about money, but about making end users and volunteer translators happy. Wikimedia’s main goal is to make useful knowledge accessible to all of humanity, and knowledge is more accessible if our website’s user interface is fully translated; and since we have to rely on volunteers for translation, we have to make them happy by making their work as comfortable and rewarding as possible. Quick deployments is one of those things that provide this rewarding feeling.
Another presentation in yesterday’s meetup was by Orit Yehezkel, who showed how localization is done in Waze, a popular traffic-aware GPS navigator app. It is a commercial product that relies on advertisement for revenue, but for the actual functionality of mapping, reporting traffic and localization, it relies on a loyal community of volunteers. One thing that I especially loved in this presentation is Orit’s explanation of why it is better to get the translations from the volunteer community rather than from a commercial translation service: “Our users understand our product better than anybody else”.
I’ve been always saying the same thing about Wikimedia: Wikimedia projects editors are better than anybody else in understanding the internal lingo, the functionality, the processes and hence – the context of all the details of the interface and the right way to translate them.
My name is written Amir Elisha Aharoni in English. In Hebrew it’s אמיר אלישע אהרוני, in Russian it’s Амир Элиша Аарони, in Hindi it’s अमीर एलिशा अहरोनि. It could be written in hundreds of other languages in many different ways.
More importantly, if I fill a form in Hebrew, I should write my name in Hebrew and not in English or in any other language.
Based on this simple notion, I wrote a post a year ago in support of localizing people’s names. I basically suggested, that it should be possible to have a person’s name written in more than one language in social networks, “from” and “to” fields in email, and in any other relevant place. Facebook allows doing this, but in a very rudimentary way; for example, the number of possible languages is very limited.
Today I am participating in the Open Source Language Summit in the Red Hat offices in Pune. Here we have, among many other talented an interesting people, two developers from the Mifos project, which creates Free software for microfinance. Mifos is being translated in translatewiki.net, a software translation site of which I am one of the developers.
Nayan Ambali, one of the Mifos developers, told me that they actually plan to implement a name localization feature in their software. This is not related to software localization, where a pre-defined set of strings is translated. It is something to be translated by the users of Mifos itself. The particular reason why Mifos needs such a feature comes from its nature as microfinance software: financial documents must be filled in the language of each country for legal purposes. Therefore, a Mifos user in the Indian state of Karnataka may need to have her name written in the software in English, Hindi, and Kannada – different languages, which are needed in different documents.
A simple sketch of database structure for storing names in multiple languages
Such a feature is quite simple to implement. In the backend this means that the name must be stored in a separate table that will hold names in different languages; see the sketch I made with Nayan above. On the frontend it will need a widget for adding names in different languages, similar to the one that Wikidata has; see the screenshot below.
The name of Steven Spielberg in many languages in Wikidata, with an option to add more languages
In this post I shall use the Tower of Babel in a somewhat more relevant and specific way: It will speak about multilingualism and about Babel itself.
This is how most people saw the Wikipedia article about the Tower of Babel until today:
The tower of Babel. Notice the pointless squares in the Akkadian name. They are called “tofu” in the jargon on internationalization programmers.
And this is how most people will see it from today:
And we have the name written in real Akkadian cuneiform!
Notice how the Akkadian name now appears as actual Akkadian cuneiform, and not as meaningless squares. Even if you, like most people, cannot actually read cuneiform, you probably understand that showing it this way is more correct, useful and educational.
This is possible thanks to the webfonts technology, which was enabled on the English Wikipedia today. It was already enabled in Wikipedias in some languages for many months, mostly in languages of India, which have severe problems with font support in the common operating systems, but now it’s available in the English Wikipedia, where it mostly serves to show parts of text that are written in exotic fonts.
The current iteration of the webfonts support in Wikipedia is part of a larger project: the Universal Language Selector (ULS). I am very proud to be one of its developers. My team in Wikimedia developed it over the last year or so, during which it underwent a rigorous process of design, testing with dozens of users from different countries, development, bug fixing and deployment. In addition to webfonts it provides an easy way to pick the user interface language, and to type in non-English languages (the latter feature is disabled by default in the English Wikipedia; to enable it, click the cog icon near “Languages” in the sidebar, then click “Input” and “Enable input tools”). In the future it will provide even more abilities, so stay tuned.
If you edit Wikipedia, or want to try editing it, one way in which you could help with the deployment of webfonts would be to make sure that all foreign strings in Wikipedia are marked with the appropriate HTML lang attribute; for example, that every Vietnamese string is marked as <span lang=”vi” dir=”ltr”>. This will help the software apply the webfonts correctly, and in the future it will also help spelling and hyphenation software, etc.
This wouldn’t be possible without the help of many, many people. The developers of Mozilla Firefox, Google Chrome, Safari, Microsoft Internet Explorer and Opera, who developed the support for webfonts in these browsers; The people in Wikimedia who designed and developed the ULS: Alolita Sharma, Arun Ganesh, Brandon Harris, Niklas Laxström, Pau Giner, Santhosh Thottingal and Siebrand Mazeland; The many volunteers who tested ULS and reported useful bugs; The people in Unicode, such as Michael Everson, who work hard to give a number to every letter in every imaginable alphabet and make massive online multilingualism possible; And last but not least, the talented and generous people who developed all those fonts for the different scripts and released them under Free licenses. I send you all my deep appreciation, as a developer and as a reader of Wikipedia.
This issue is laughably easy to avoid: If you are writing the content, you are supposed to know in what language it is written, so if it’s English, just write <html lang=”en” dir=”ltr”> even though these seem to be the defaults. Nineteen or so characters that ensure your content is readable and not displayed backwards. Please do it always and tell all your friends to do it.
The problem is that you don’t only have to explicitly set the language and the direction, but, as silly as it sounds, you have to set them correctly, too. A more subtle, but nevertheless quite frequent and disruptive bug is displaying presumably, but not actually, translated content in a different direction. This happens quite frequently when a website supports the browser language detection feature, known as Accept-Language:
The web server sees that the browser requests content in Hebrew.
The web server sends a response with <html lang=”he” dir=”rtl”>, but because the website is not actually translated, the text is shown in the fallback language, which is usually English.
The user sees the content just like this numbered list, which I intentionally set to dir=”rtl”: with the numbers and the punctuation on the wrong side, and possibly invisible, because English is not a right-to-left language.
Of course, it can go even worse. Arrows can point the wrong way and buttons and images can overlap and hide each other, rendering the page not just hard to read, but totally unusable.
This bug is also an example of the Software Localization Paradox: It manifests itself when Accept-Language is not English, but most developers install English operating systems and don’t bother to change the preferred language settings in the browser, so they never see how this bug manifests itself. The site developers don’t bother to test for it either.
The solution, of course, is to set a different language and direction only if the site is actually translated, and not to pretend that it’s translated if it’s not.
Here are two examples of such brokenness. Both sites are important and useful, but hard to use for people whose Accept-Language is Hebrew, Persian or Arabic.
Mozilla Developer Network website, in English, but right-to-left
Notice how the full stops are on the left end and how the text overlaps the images in the tiles on the right-hand side. This is how it is supposed to look, more or less:
Mozilla Developer Network home page in English, left-to-right
I manually changed dir=”rtl” to dir=”ltr” using the element inspector from Firefox’s developer tools and I also had to tweak a CSS class to move the “mozilla” tab at the top.
After showing an example of a web development bug from a site for, ahem, web developers, here is an even funnier example: The home page of Unicode’s CLDR. That’s right: Unicode’s own website shows text with incorrect direction:
The Unicode CLDR website, in English but right-to-left
The only words translated here are “Contents” (תוכן) and “Search this site” (חיפוש באתר זה), which is not so useful. The rest is shown in English, and the direction is broken: Notice the strange alignment of the content and the schedule table. A few months ago that table was so broken that its content wasn’t visible at all, but that was probably patched.
Here’s how it is supposed to look:
The CLDR home page in English, appropriately left-to-right
I tried reporting the CLDR home page direction bug, but it was closed as “out-of-scope”: The CLDR developers say that the Google Sites infrastructure is to blame. This is frustrating, because as far as I know Google Sites doesn’t have a proper bug reporting system and all I can do is write a question about that direction problem in the Google Sites forum and hope that somebody notices it or poke my Googler friends.
One thing that I will not do is switch my Accept-Language to English. Whenever I can, I don’t just want to see the website correctly, but to try to help my neighbor: see the possible problems that can affect other users who use different language. Somebody has to break the Software Localization Paradox.
Because of some not-so-interesting technical reasons I ended up on the mailing list for reporting bugs in Wikipedia’s mobile app (please see disclaimer in the end).
Reading real Wikipedia readers’ reactions is fascinating.
A lot of the emails there are just empty. People just press the button to report a problem and don’t actually write anything at all.
Sometimes they are just slightly less than empty. For example, quite a lot of people write things like “When will you fix your stupid app already???!?!!”. This may seem pointless and unconstructive, but actually these people think that there is context to what they say, because they see complaints from other people at Google’s or Apple’s app store and they assume that the app’s maintainers are aware of them. Some people also threaten to give the app a low rating in the app store; it’s not really wrong, but it’s not very helpful either.
A lot of the emails are about connectivity problems in Android 2.2.2 and about screen rotation problems on iPad. The developers are aware of both issues and are working on them.
And a whole lot of reports suggest fixes in content, rather than technical problems. Some of them are pointless, for example “The facts on this web sight is wrong and i want it changed to the corrected statement”. It never occurred to that person that it would be helpful to say what information is wrong or what should be written there (it can also be a troll). And some people do make useful suggestions. For example, one person reported that Obama didn’t write “How the Grinch Stole Christmas“. The report was correct: somebody indeed vandalized the article about the children’s book and wrote that its author is Obama. It was an easy fix, so I just fixed it myself and replied, thanking the person for the report and saying that in the future she can fix it herself by pressing the “edit” button.
If I see that fixing the problem will take more than a minute, I just reply with “you can fix it yourself”. This does make me think that a more robust way of telling people that they can fix the problems themselves is needed.
All these issues aside, there is something truly wonderful about this app: People write these emails in their language without caring at all about who will read them. Reporting a bug in Bugzilla is hard for many reasons, one of which is certainly the language. But the app gives the user a completely localized experience, so the users don’t think twice before sending a bug report in their language.
And this is a good thing. Some People from Some Companies told me explicitly that they give up on processing reports from too many people in too many languages; not Wikimedia. Wikimedia may acknowledge that it’s hard, Wikimedia won’t commit to replying to each email, but Wikimedia wouldn’t just shut it down and ignore it completely, either. We would rather think about more efficient ways to get volunteers to reply to people efficiently or to help people fix the issues themselves – that’s what the whole “wiki” idea is about in the first place.
(Important disclaimer: I am involved with this mailing list as a volunteer. It has nothing to do with the paid work that I do for the Wikimedia Foundation. I do not officially represent the Foundation in any actions that I take with regard to that mailing list.)