Archive for the 'Wikipedia' Category

Wikipedia, a Jamaican Jew, and Yak Shaving

For me, writing in Wikipedia is very often a story, within a story, within a story.

I am a member of the Language committee, which examines and approves the creation of editions of Wikipedia in new languages.

Recently we approved the new edition in the Jamaican language—an English-based creole commonly heard in reggae, in which books were published, and into which “the usual suspects” were translated: The New Testament, Alice’s Adventures in Wonderland, The Little Prince—and now, Wikipedia.

Since the draft “incubator” Wikipedia in this language conformed to the requirements for creating a full-fledged new domain, I supported the domain’s creation. My work as a language committee member could end here—and I’m a volunteer there to begin with—but I nonetheless decided to shave a yak.

bos_grunniens_at_letdar_on_annapurna_circuit

Normal people, when they need a sweater, buy one in a store. I consider shaving a yak.

Some time after a Wikipedia in a new language is created, all the draft articles from the incubator are imported. When that is completed, I go over the list of imported articles and try to see whether there are any that aren’t linked to their counterparts in other languages. With some topics it’s easy by guessing the name of the topic or by looking at the images, and with some others it’s hard. With an English-based creole it’s of course very easy.

And that’s how the Jamaican Wikipedia ended up with only one article that doesn’t have a version in any other language: Aizak Mendiz Belisario.

It was easy enough to understand that this was a Jewish artist who lived in Jamaica in the 19th century. He was already mentioned a couple of times in the English Wikipedia, but there was no whole article about him. So I thought: Jamaican is similar enough to English and I can understand what most of the article is about, and the artist seems notable enough for an encyclopedia, because he was one of the pioneers of art in Jamaica, and because an anthology about him was published recently. And, of course, I am in a team that develops Content Translation—a translation tool for Wikipedia articles. So I decided to translate it to English.

As soon as I started the translation process, I noticed a bug. So I filed it, and because it was so easy to fix, I just fixed it.

Then I started actually translating the article. On the way I learned about the John Canoe festival, and added another spelling variant to the article about it in English; I verified that the book about the artist was actually published (you know, hoaxes happen), and googled for some more information about the artist with the hope of improving the English article further.

belisario3

Normal people could just say “Fine, that language looks legit, let’s start a Wikipedia in it”. But I actually had to read all the articles in it, and then write a new one, improve another one, fix a bug, and write a blog post about all of it.

So here you go: Isaac Mendes Belisario, in English.

There is a story like this one behind every one of the millions and millions of articles in Wikipedia in all of its languages.

Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia

As you probably already know, Wikipedia is a website. A website has content—the articles; and it has user interface—the menus around the articles and the various screens that let editors edit the articles and communicate to each other.

Another thing that you probably already know is that Wikipedia is massively multilingual, so both the content and the user interface must be translated.

Translation of articles is a topic for another post. This post is about getting all of the user interface translated to your language, as quickly and efficiently as possible.

The most important piece of software that powers Wikipedia and its sister projects is called MediaWiki. As of today, there are 3,335 messages to translate in MediaWiki, and the number grows frequently. “Messages” in the MediaWiki jargon are strings that are shown in the user interface, and that can be translated. In addition to core MediaWiki, Wikipedia also has dozens of MediaWiki extensions installed, some of them very important—extensions for displaying citations and mathematical formulas, uploading files, receiving notifications, mobile browsing, different editing environments, etc. There are around 3,500 messages to translate in the main extensions, and over 10,000 messages to translate if you want to have all the extensions translated. There are also the Wikipedia mobile apps and additional tools for making automated edits (bots) and monitoring vandalism, with several hundreds of messages each.

Translating all of it probably sounds like an enormous job, and yes, it takes time, but it’s doable.

In February 2011 or so—sorry, I don’t remember the exact date—I completed the translation into Hebrew of all of the messages that are needed for Wikipedia and projects related to it. All. The total, complete, no-excuses, premium Wikipedia experience, in Hebrew. Every single part of the MediaWiki software, extensions and additional tools was translated to Hebrew, and if you were a Hebrew speaker, you didn’t need to know a single English word to use it.

I wasn’t the only one who did this of course. There were plenty of other people who did this before I joined the effort, and plenty of others who helped along the way: Rotem Dan, Ofra Hod, Yaron Shahrabani, Rotem Liss, Or Shapiro, Shani Evenshtein, Inkbug (whose real name I don’t know), and many others. But back then in 2011 it was I who made a conscious effort to get to 100%. It took me quite a few weeks, but I made it.

Of course, the software that powers Wikipedia changes every single day. So the day after the translations statistics got to 100%, they went down to 99%, because new messages to translate were added. But there were just a few of them, and it took me a few minutes to translate them and get back to 100%.

I’ve been doing this almost every day since then, keeping Hebrew at 100%. Sometimes it slips because I am traveling or I am ill. It slipped for quite a few months because in late 2014 I became a father, and a lot of new messages happened to be added at the same time, but Hebrew is back at 100% now. And I keep doing this.

With the sincere hope that this will be useful for translating the software behind Wikipedia to your language, let me tell you how.

Preparation

First, let’s do some work to set you up.

  • Get a translatewiki.net account if you haven’t already.
  • Make sure you know your language code.
  • Go to your preferences, to the Editing tab, and add languages that you know to Assistant languages. For example, if you speak one of the native languages of South America like Aymara (ay) or Quechua (qu), then you probably also know Spanish (es) or Portuguese (pt), and if you speak one of the languages of the former Soviet Union like Tatar (tt) or Azerbaijani (az), then you probably also know Russian (ru). When available, translations to these languages will be shown in addition to English.
  • Familiarize yourself with the Support page and with the general localization guidelines for MediaWiki.
  • Add yourself to the portal for your language. The page name is Portal:Xyz, where Xyz is your language code.

Priorities, part 1

The translatewiki.net website hosts many projects to translate beyond stuff related to Wikipedia. It hosts such respectable Free Software projects as OpenStreetMap, Etherpad, MathJax, Blockly, and others. Also, not all the MediaWiki extensions are used on Wikimedia projects; there are plenty of extensions, with thousands of translatable messages, that are not used by Wikimedia, but only on other sites, but they use translatewiki.net as the platform for translation of their user interface.

It would be nice to translate all of it, but because I don’t have time for that, I have to prioritize.

On my translatewiki.net user page I have a list of direct links to the translation interface of the projects that are the most important:

  • Core MediaWiki: the heart of it all
  • Extensions used by Wikimedia: the extensions on Wikipedia and related sites
  • MediaWiki Action Api: the documentation of the API functions, mostly interesting to developers who build tools around Wikimedia projects
  • Wikipedia Android app
  • Wikipedia iOS app
  • Installer: MediaWiki’s installer, not used in Wikipedia because MediaWiki is already installed there, but useful for people who install their own instances of MediaWiki, in particular new developers
  • Intuition: a set of different tools, like edit counters, statistics collectors, etc.
  • Pywikibot: a library for writing bots—scripts that make useful automatic edits to MediaWiki sites.

I usually don’t work on translating other projects unless all of the above projects are 100% translated to Hebrew. I occasionally make an exception for OpenStreetMap or Etherpad, but only if there’s little to translate there and the untranslated MediaWiki-related projects are not very important.

Priorities, part 2

So how can you know what is important among more than 15,000 messages from the Wikimedia universe?

Start from MediaWiki most important messages. If your language is not at 100% in this list, it absolutely must be. This list is automatically created periodically by counting which 600 or so messages are actually shown most frequently to Wikipedia users. This list includes messages from MediaWiki core and a bunch of extensions, so when you’re done with it, you’ll see that the statistics for several groups improved by themselves.

Now, if the translation of MediaWiki core to your language is not yet at 18%, get it there. Why 18%? Because that’s the threshold for exporting your language to the source code. This is essential for making it possible to use your language in your Wikipedia (or Incubator). It will be quite easy to find short and simple messages to translate (of course, you still have to do it carefully and correctly).

Getting Things Done, One by One

Once you have the most important MediaWiki messages 100% and at least 18% of MediaWiki core is translated to your language, where do you go next?

I have surprising advice.

You need to get everything to 100% eventually. There are several ways to get there. Your mileage may vary, but I’m going to suggest the way that worked for me: Complete the easiest piece that will get your language closer to 100%! For me this is an easy way to strike an item off my list and feel that I accomplished something.

But still, there are so many items at which you could start looking! So here’s my selection of components that are more user-visible and less technical, sorted not by importance, but by the number of messages to translate:

  • Cite: the extension that displays footnotes on Wikipedia
  • Babel: the extension that displays boxes on userpages with information about the languages that the user knows
  • Math: the extension that displays math formulas in articles
  • Thanks: the extension for sending “thank you” messages to other editors
  • Universal Language Selector: the extension that lets people select the language they need from a long list of languages (disclaimer: I am one of its developers)
    • jquery.uls: an internal component of Universal Language Selector that has to be translated separately for technical reasons
  • Wikibase Client: the part of Wikidata that appears on Wikipedia, mostly for handling interlanguage links
  • VisualEditor: the extension that allows Wikipedia articles to be edited in a WYSIWYG style
  • ProofreadPage: the extension that makes it easy to digitize PDF and DjVu files on Wikisource
  • Wikibase Lib: additional messages for Wikidata
  • Echo: the extension that shows notifications about messages and events (the red numbers at the top of Wikipedia)
  • MobileFrontend: the extension that adapts MediaWiki to mobile phones
  • WikiEditor: the toolbar for the classic wiki syntax editor
  • ContentTranslation extension that helps translate articles between languages (disclaimer: I am one of its developers)
  • Wikipedia Android mobile app
  • Wikipedia iOS mobile app
  • UploadWizard: the extension that helps people upload files to Wikimedia Commons comfortably
  • Flow: the extension that is starting to make talk pages more comfortable to use
  • Wikibase Repo: the extension that powers the Wikidata website
  • Translate: the extension that powers translatewiki.net itself (disclaimer: I am one of its developers)
  • MediaWiki core: the base MediaWiki software itself!

I put MediaWiki core last intentionally. It’s a very large message group, with over 3000 messages. It’s hard to get it completed quickly, and to be honest, some of its features are not seen very frequently by users who aren’t site administrators or very advanced editors. By all means, do complete it, try to do it as early as possible, and get your friends to help you, but it’s also OK if it takes some time.

Getting All Things Done

OK, so if you translate all the items above, you’ll make Wikipedia in your language mostly usable for most readers and editors.

But let’s go further.

Let’s go further not just for the sake of seeing pure 100% in the statistics everywhere. There’s more.

As I wrote above, the software changes every single day. So do the translatable messages. You need to get your language to 100% not just once; you need to keep doing it continuously.

Once you make the effort of getting to 100%, it will be much easier to keep it there. This means translating some things that are used rarely (but used nevertheless; otherwise they’d be removed). This means investing a few more days or weeks into translating-translating-translating.

You’ll be able to congratulate yourself not only upon the big accomplishment of getting everything to 100%, but also upon the accomplishments along the way.

One strategy to accomplish this is translating extension by extension. This means, going to your translatewiki.net language statistics: here’s an example with Albanian, but choose your own language. Click “expand” on MediaWiki, then again “expand” on “MediaWiki Extensions”, then on “Extensions used by Wikimedia” and finally, on “Extensions used by Wikimedia – Main”. Similarly to what I described above, find the smaller extensions first and translate them. Once you’re done with all the Main extensions, do all the extensions used by Wikimedia. (Going to all extensions, beyond Extensions used by Wikimedia, helps users of these extensions, but doesn’t help Wikipedia very much.) This strategy can work well if you have several people translating to your language, because it’s easy to divide work by topic.

Another strategy is quiet and friendly competition with other languages. Open the statistics for Extensions Used by Wikimedia – Main and sort the table by the “Completion” column. Find your language. Now translate as many messages as needed to pass the language above you in the list. Then translate as many messages as needed to pass the next language above you in the list. Repeat until you get to 100%.

For example, here’s an excerpt from the statistics for today:

MediaWiki translation stats exampleLet’s say that you are translating to Malay. You only need to translate eight messages to go up a notch (901 – 894 + 1). Then six messages more to go up another notch (894 – 888). And so on.

Once you’re done, you will have translated over 3,400 messages, but it’s much easier to do it in small steps.

Once you get to 100% in the main extensions, do the same with all the Extensions Used by Wikimedia. It’s over 10,000 messages, but the same strategies work.

Good Stuff to Do Along the Way

Never assume that the English message is perfect. Never. Do what you can to improve the English messages.

Developers are people just like you are. They may know their code very well, but they may not be the most brilliant writers. And though some messages are written by professional user experience designers, many are written by the developers themselves. Developers are developers; they are not necessarily very good writers or designers, and the messages that they write in English may not be perfect. Keep in mind that many, many MediaWiki developers are not native English speakers; a lot of them are from Russia, Netherlands, India, Spain, Germany, Norway, China, France and many other countries, and English is foreign to them, and they may make mistakes.

So report problems with the English messages to the translatewiki Support page. (Use the opportunity to help other translators who are asking questions there, if you can.)

Another good thing is to do your best to try running the software that you are translating. If there are thousands of messages that are not translated to your language, then chances are that it’s already deployed in Wikipedia and you can try it. Actually trying to use it will help you translate it better.

Whenever relevant, fix the documentation displayed near the translation area. Strange as it may sound, it is possible that you understand the message better than the developer who wrote it!

Before translating a component, review the messages that were already translated. To do this, click the “All” tab at the top of the translation area. It’s useful for learning the current terminology, and you can also improve them and make them more consistent.

After you gain some experience, create a localization guide in your language. There are very few of them at the moment, and there should be more. Here’s the localization guide for French, for example. Create your own with the title “Localisation guidelines/xyz” where “xyz” is your language code.

As in Wikipedia, Be Bold.

OK, So I Got to 100%, What Now?

Well done and congratulations.

Now check the statistics for your language every day. I can’t emphasize how important it is to do this every day.

The way I do this is having a list of links on my translatewiki.net user page. I click them every day, and if there’s anything new to translate, I immediately translate it. Usually there is just a small number of new messages to translate; I didn’t measure precisely, but usually it’s less than 20. Quite often you won’t have to translate from scratch, but to update the translation of a message that changed in English, which is usually even faster.

But what if you suddenly see 200 new messages to translate? It happens occasionally. Maybe several times a year, when a major new feature is added or an existing feature is changed.

Basically, handle it the same way you got to 100% before: step by step, part by part, day by day, week by week, notch by notch, and get back to 100%.

But you can also try to anticipate it. Follow the discussions about new features, check out new extensions that appear before they are added to the Extensions Used by Wikimedia group, consider translating them when you have a few spare minutes. At the worst case, they will never be used by Wikimedia, but they may be used by somebody else who speaks your language, and your translations will definitely feed the translation memory database that helps you and other people translate more efficiently and easily.

Consider also translating other useful projects: OpenStreetMap, Etherpad, Blockly, Encyclopedia of Life, etc. Up to you. The same techniques apply everywhere.

What Do I Get for Doing All This Work?

The knowledge that thanks to you people who read in your language can use Wikipedia without having to learn English. Awesome, isn’t it? Some people call it “Good karma”.

Oh, and enormous experience with software localization, which is a rather useful job skill these days.

Is There Any Other Way in Which I Can Help?

Yes!

If you find this post useful, please translate it to other languages and publish it in your blog. No copyright restrictions, public domain (but it would be nice if you credit me and send me a link to your translation). Make any adaptations you need for your language. It took me years of experience to learn all of this, and it took me about four hours to write it. Translating it will take you much less than four hours, and it will help people be more efficient translators.

Continuous Translation and Rewarding Volunteers

In November I gave a talk about how we do localization in Wikimedia at a localization meetup in Tel-Aviv, kindly organized by Eyal Mrejen from Wix.

I presented translatewiki.net and UniversalLanguageSelector. I quickly and quite casually said that when you submit a translation at translatewiki, the translation will be deployed to the live Wikipedia sites in your language within a day or two, after one of translatewiki.net staff members will synchronize the translations database with the MediaWiki source code repository and a scheduled job will copy the new translation to the live site.

Yesterday I attended another of those localization meetups, in which Wix developers themselves presented what they call “Continuous Translation”, similarly to “Continuous Integration“, a popular software deployment methodology. Without going into deep details, “Continuous Translation” as described by Wix is pretty much the same thing as what we have been doing in the Wikimedia world: Translators’ work is separated from coding; all languages are stored in the same way; the translations are validated, merged and deployed as quickly and as automatically as possible. That’s how we’ve been doing it since 2009 or so, without bothering to give this methodology a name.

So in my talk I mentioned it quickly and casually, and the Wix developers did most of their talk about it.

I guess that Wix are doing it because it’s good for their business. Wikimedia is also doing it because it’s good for our business, although our business is not about money, but about making end users and volunteer translators happy. Wikimedia’s main goal is to make useful knowledge accessible to all of humanity, and knowledge is more accessible if our website’s user interface is fully translated; and since we have to rely on volunteers for translation, we have to make them happy by making their work as comfortable and rewarding as possible. Quick deployments is one of those things that provide this rewarding feeling.

Another presentation in yesterday’s meetup was by Orit Yehezkel, who showed how localization is done in Waze, a popular traffic-aware GPS navigator app. It is a commercial product that relies on advertisement for revenue, but for the actual functionality of mapping, reporting traffic and localization, it relies on a loyal community of volunteers. One thing that I especially loved in this presentation is Orit’s explanation of why it is better to get the translations from the volunteer community rather than from a commercial translation service: “Our users understand our product better than anybody else”.

I’ve been always saying the same thing about Wikimedia: Wikimedia projects editors are better than anybody else in understanding the internal lingo, the functionality, the processes and hence – the context of all the details of the interface and the right way to translate them.

Link Wikipedia Articles in Different Languages

OK THIS IS AWESOME, and “awesome” is not a word that I use lightly.

As a gift for the second birthday of the Wikidata project, nice people at Google created a tool that helps people link articles in different languages that are not linked yet. They prepared a list with thousands of pairs of articles in different languages that are supposed to be about the same subject according to their automatic guesswork. The tool only shows such articles, and a human editor must check whether they actually match, and if they do—make the linking automatically.

There were thirty six such articles for the Hebrew–English pair. About four of them were unrelated, and I fixed the linking between the rest of them. Some of them required manual intervention, because there were interfering links to unrelated subjects. For some simple cases it took me just a few seconds, and for a few complicated ones—a few minutes.

I also tried doing the same for Russian–English, but there are over a thousand article pairs there, so I only did a few. I also did a few for Catalan and Greek, and I finished all ten pairs for Bengali, even though I don’t actually know Greek or Bengali. I just used a bit of healthy intuition and Google Translate, and I’m pretty sure that I did it well.

You can help!

Here are my suggested instructions for doing this.

Preparation:

  1. Log in to mediawiki.org. This account is used also for the tool.
  2. Now go to the tool’s site. Click Login, and allow the tool to use your mediawiki.org account.
  3. Go to settings, and choose your pair of languages.
  4. Go to “Check by list” and you’ll see a list of article pairs. If there are no suggested article pairs for the language pair you selected, go back to number 3 choose some other languages. As I wrote above, from my experience, you don’t need to know a language thoroughly to perform this useful work ;)

Now click a link to a pair of articles that looks reasonable. Articles in both languages will open side by side.

  1. If the articles are definitely not about the exact same subject, click “No” in the list and find another pair.
  2. If the articles are about the same subject and one of them doesn’t have any interlanguage links, click “Add links” in the interlanguage area. In the box that will open, write the language name of the other language in the first field and the title of the article in the other field, and then click the “Link with page” button. A list of articles in other languages will be shown. If it looks reasonable, click “Confirm”, and then “Close dialog and reload page”. That’s it, the pages are linked! Click “Yes” in the list in the linking tool and proceed to another article pair.
  3. If the articles are about the same subject, but both of them appear to have links to other language, it’s possible that explicit interlanguage links are written in the source code of the articles. To resolve this, do the following:
    1. Open both articles for editing in source mode.
    2. Scroll all the way down and find whether they have explicit interlanguage links.
    3. If these are correct links to articles about the same subjects in other languages, go to those articles, and link them using Wikidata. Note that it often happens in such cases that these are links to redirects, so the actual current title may be different.
    4. If these are links to articles about other subjects, even if they are related, remove those links. For example, if the article in Bengali is about an island, and the article in Dutch is about a city on that island, remove the link – these subject are distinct enough. Ditto if the article in English is about an American human rights organization and the article in French is about a French human rights organization.
    5. If you were able to remove all the explicit links from the source, go back to point 2 above and link the articles using Wikidata.
    6. If it’s too complicated to remove these links for any reason, feel free to go to another article, but it would be nice to leave a note about this on the articles’ talk pages so that other editors would clean this up some time.

That’s it. It may get a tad complicated for some cases, but if you ask me, it’s a lot of fun.

Where to read about the Elections in India?

There is an election process going on in India, which is frequently called “the world’s largest democracy” and an “upcoming world power”. Both descriptions are quite true, so elections in such a country should be pretty important, shouldn’t they?

Because of my work I have a lot of Facebook friends in India, and they frequently write about it. Mostly in English, and sometimes in their own languages—Hindi, Kannada, Malayalam and others. Even when it’s in English I hardly understand anything, however, because it is coming from people who are immersed in the India culture.

It is similar with Indian English-language news sites, such as The Times of India: The language is English, but to me it feels like information overload, and there are too many words that are known to Indians, but not to me.

With English-language news sites outside of India, such as CNN, BBC and The Guardian it’s the opposite: they give too little attention to this topic. I already know pretty much everything that they have to say: a huge number of people are voting, Narendra Modi from the BJP is likely to become the new prime minister and the Congress party is likely to become weaker.

Russian and Hebrew sites hardly mention it at all.

What’s left? Wikipedia, of course. Though far from perfect, the English Wikipedia page Indian general election, 2014 gives a good summary of the topic for people who are not Indians. It links terms that are not known to foreigners, such as “Lok Sabha” and “UPA” to their Wikipedia articles, so learning about them requires just one click. When they are mentioned in The Times of India, I have to open Wikipedia and read about them, so why not do it in Wikipedia directly?

This also happens to be the first Google result for “india elections”. And if you go the page “Elections in India” in Wikipedia, a note on the top conveniently sends you directly to the page about the ongoing election process. Compare this to the Britannica website: searching it for “india elections” yields results that are hardly useful—there’s hardly anything about elections in India in general, let alone about the current one.

One thing that I didn’t like is the usage of characteristic Indian words such as “lakh” and “crore”, which mean, respectively, “a hundred thousands” and “ten millions”. I replaced most of their occurrences in the article with the usual international numbers, and I think that I found a calculation mistake on the way.

So while Wikipedia is, again, far from perfect, its “wisdom of the crowds” system works surprisingly well time after time.

WikiAcademy Kosovo 2014 or, “This Israeli geek made a joke about hobbits. You won’t believe what happened next.”

In late February 2014 I attended WikiAcademy in Kosovo.

What do most people know about Kosovo? That it’s a place somewhere… um… they kinda heard about in the news some time ago.


What it actually is? It’s a partially recognized country, which was in the past part of Serbia and Yugoslavia. It is mostly populated by Albanians, with small minorities of Serbs, Turks and others.

The ethnic difference between Kosovo and the rest of Serbia caused many tensions. In the 1990s and 2000s the area experienced a lot of violence. NATO and much of Europe supported Kosovo’s independence, Serbia and Russia objected, and after the Kosovo war the region emerged as a de-facto independent state. Some countries recognized it and some didn’t.

Sadly, as it happens very often, what most people hear about such places is a lot of news about violence and very little stories about anything else—history, culture, architecture, language, music. I definitely care a lot about these positive things, and not much about the wars.


I flew via Istanbul, and the lady at the boarding didn’t quite know what to do with my passport: she looked for a visa and when she couldn’t find one, she just asked me whether I need one. I said that I didn’t and she let me on the plane. The passport control guy on arrival also didn’t know whether Israelis need a visa and had to check a table. I guess that not many Israelis come there, which is a shame, really.

Right from the airport my hosts took me to a different event: a BarCamp in Prizren. Prizren is absolutely beautiful. The Byrek—what we call “Burekas” in Israel—is delicious, the beer is fantastic, the streets are beautiful and the buildings are magnificent. Magnificent not like in France, but like in… Kosovo. In the Balkans. Down to Earth and human.

Prizren, Stone bridge and Sinan Pasha Mosque. Tobias Klenze, CC-BY-SA 3.0

Prizren, Stone bridge and Sinan Pasha Mosque. Tobias Klenze, CC-BY-SA 3.0

In the BarCamp there were three talks: two in Albanian, which I sadly don’t know, about… some Open Source projects. The third one was mine, about that-website-that-we-all-know-and-love. I used just one “slide”—xkcd’s famous protester, which, to my surprise, a lot of people in the audience didn’t recognize. I invited people to contribute, of course, and I enjoyed answering a question about how concepts such as “love” can be referenced and fact-checked. The bar is which the event was held is called “Hobbiton”, which is appropriately adorned with multiple Tolkien-themed posters. The Albanian Wikipedia, however, didn’t have an article about Hobbits, which I mentioned in my talk, with hope that it would be written. Was it? Stay tuned.


My second and third day in Kosovo were dedicated to WikiAcademy itself—first in the town of Gjakova and then in the capital, Prishtina.

So what is the WikiAcademy in Kosovo? It’s an event organized by IPKO Foundation, a local organization that promotes modern telecommunications in Kosovo. This event is different from what we call “WikiAcademy” in Israel, which is more like an academic conference with talks—in Kosovo it’s more like a Wikipedia editing workshop for newcomers, but a very large one. The 2014 edition of WikiAcademy is the second edition of this event, and it was held over three weekends in two cities—Gjakova and the capital Prishtina, with participants from several more cities. Over two hundred people participated.

The organizers, some of whom are experienced Wikipedians themselves, prepared the event very well. The logistics were great—working wifi, tasty food and comfortable transportation—but more importantly, the participants were very well-prepared for their task of writing Wikipedia articles: they received clear topics and instructions about writing with correct encyclopedic style and citing sources.

The articles were written in the English Wikipedia. The topics of the articles were all about cities of Kosovo: their architecture and monuments, education, events, festivals, culture, history and nature. Most people probably never heard about cities with unusual names such as Peja (a.k.a. Peć), Ferizaj (a.k.a. Uroševac) and Štrpce and well, now they are not just mentioned in the English Wikipedia by name, but there are several detailed articles about different topics related to each of them.

I’ll reiterate this: It was fantastic to see people sitting with reference books and encyclopedias to be able to cite sources. So often this is the biggest challenge in Wikipedia editing workshops, and the organizers prepared the participants very well. It was also great that everybody knew which articles are they working on.

My role was to give three talks, about Wikipedia’s encyclopedic writing style, about good practices for talk pages, and about translating articles, and other than that—to help people write, cite sources correctly, insert images, and make sure that they don’t violate any policies. It was challenging and tiring, but oh so fun. Hackathons and Wikipedia meet-ups are possibly the only kinds of events where I’m continuously so energized and talkative.

People sitting with laptops in a room

WikiAcademy Prishtina. Katie Chan, CC-BY-SA 4.0.

I also did my best to showcase Wikimedia’s newest software: VisualEditor, which the newbies just loved, and the Content Translation prototype.

During my talk about translation, I created, as a demo, two articles in Albanian: Hobbit (as promised above!), and Haifa, the city that hosted Wikimania 2011.


After I came back, the event continued. I was one of the judges that chose the best articles and photos for awarding prizes. The big winner was the well-deserving Historical monuments in Prishtina, although many others were wonderful: Flaka e Janarit, Rugova Mountains, Health Care in Kosovo, Water in Prishtina. The awards ceremony was held a few days after that.

There was also a bit of a dark side to the contest: Because most of the writers were newbies, and none were English speakers, there were many little innocent mistakes in spelling, referencing and writing style, which the English Wikipedia editors took very seriously. Some articles were even proposed for deletion, although all (or almost all) were kept. This, again, raises the well-known dilemma—it’s important to keep Wikipedia’s standards high, but it’s just as important to remain nice in the process and not “bite the newcomers”.


My thanks go out to the excellent organizers: Arianit Dobroshi, Gent Thaçi, Abetare Gojani, Rineta Hoxha, Altin Ukshini, Lis Balaj and many others. I enjoyed every minute, and learned a lot.


Related:

This post mostly uses Albanian spellings of place names. I do that simply because that’s what I saw during my visit. Don’t consider this post authoritative with regards to the preferred English spellings. Wikipedia may use different spellings, and they are not so consistent.

A Relevant Tower of Babel

The Tower of Babel is frequently used as a symbol of foreign languages. For example, several language software packages are named after it, such as the Babylon electronic dictionary, MediaWiki’s Babel extension and the Babelfish translation service (itself named after the Babel fish from The Hitchhiker’s Guide).

In this post I shall use the Tower of Babel in a somewhat more relevant and specific way: It will speak about multilingualism and about Babel itself.

This is how most people saw the Wikipedia article about the Tower of Babel until today:

The Tower of Babel article. Notice the pointless squares in the Akkadian name. They are called "tofu" in the jargon on internationalization programmers.

The tower of Babel. Notice the pointless squares in the Akkadian name. They are called “tofu” in the jargon on internationalization programmers.

And this is how most people will see it from today:

And we have the name written in real Akkadian cuneiform!

And we have the name written in real Akkadian cuneiform!

Notice how the Akkadian name now appears as actual Akkadian cuneiform, and not as meaningless squares. Even if you, like most people, cannot actually read cuneiform, you probably understand that showing it this way is more correct, useful and educational.

This is possible thanks to the webfonts technology, which was enabled on the English Wikipedia today. It was already enabled in Wikipedias in some languages for many months, mostly in languages of India, which have severe problems with font support in the common operating systems, but now it’s available in the English Wikipedia, where it mostly serves to show parts of text that are written in exotic fonts.

The current iteration of the webfonts support in Wikipedia is part of a larger project: the Universal Language Selector (ULS). I am very proud to be one of its developers. My team in Wikimedia developed it over the last year or so, during which it underwent a rigorous process of design, testing with dozens of users from different countries, development, bug fixing and deployment. In addition to webfonts it provides an easy way to pick the user interface language, and to type in non-English languages (the latter feature is disabled by default in the English Wikipedia; to enable it, click the cog icon near “Languages” in the sidebar, then click “Input” and “Enable input tools”). In the future it will provide even more abilities, so stay tuned.

If you edit Wikipedia, or want to try editing it, one way in which you could help with the deployment of webfonts would be to make sure that all foreign strings in Wikipedia are marked with the appropriate HTML lang attribute; for example, that every Vietnamese string is marked as <span lang=”vi” dir=”ltr”>. This will help the software apply the webfonts correctly, and in the future it will also help spelling and hyphenation software, etc.

This wouldn’t be possible without the help of many, many people. The developers of Mozilla Firefox, Google Chrome, Safari, Microsoft Internet Explorer and Opera, who developed the support for webfonts in these browsers; The people in Wikimedia who designed and developed the ULS: Alolita Sharma, Arun Ganesh, Brandon Harris, Niklas Laxström, Pau Giner, Santhosh Thottingal and Siebrand Mazeland; The many volunteers who tested ULS and reported useful bugs; The people in Unicode, such as Michael Everson, who work hard to give a number to every letter in every imaginable alphabet and make massive online multilingualism possible; And last but not least, the talented and generous people who developed all those fonts for the different scripts and released them under Free licenses. I send you all my deep appreciation, as a developer and as a reader of Wikipedia.


Archives