Archive for the 'Hebrew' Category

Guess Which Software the Only Hebrew TLD Runs

There already are several TLDs in the Arabic script for several Arab countries. There are no TLDs in the Hebrew script yet, although one will probably soon be created for Israel.

There is however, a test TLD in Hebrew: “טעסט”. (That’s the word “test” in Hebrew characters and according to Yiddish spelling rules.)

And there’s even an actual working domain in it: http://דוגמה.טעסט. That can be translated as “example.test”. The TLD “טעסט” now appears to the left of “דוגמה”, which is the name, because Hebrew is written right-to-left.

And what happens if you use your browser to go to that domain? It redirects you to http://דוגמה.טעסט/עמוד_ראשי. That string in the end (or in the middle if you will) is the standard Hebrew title of a MediaWiki main page, which you can also see on the Hebrew Wikipedia. The hypothesis that MediaWiki is installed there is proven further by using Google site search on the same domain: http://www.google.com/search?q=site:דוגמה.טעסט. Something in the installation is probably broken, because the pages appear blank, but the page titles can only mean one thing: MediaWiki is, or was, being used to test a Hebrew domain name.

(This post is based on information from Tomer Cohen of Mozilla Israel.)

Facebook, give me my RLM back, please

Facebook doesn’t allow typing LRM and RLM characters in the status field. These are the Unicode characters “Left-to-right marker” and “Right-to-left marker”. People who type in right-to-left languages such as Arabic, Persian, Urdu or Hebrew need these characters to make their status updates appear properly aligned. If i try to type any of these characters, they are deleted when i save the message. There is no reason to do this. Facebook engineers, please allow your users to use these characters. Thank you.

Language teacher

If you search Google for “language teacher” (מורה ללשון) in Hebrew, the autocompletion suggests “language teacher killed herself” (מורה ללשון התאבדה). The word “teacher” is spelled the same for both genders, but the verb is feminine. I don’t know why does it happen, because actually searching for it doesn’t yield anything significant.

In Israeli schools where Hebrew is the medium of teaching, “Language” is the class where the grammar of Hebrew is taught… badly.

Roth

miriamruth11-hp

miriamruth11-hp; copyright: Google; based on the original illustration by Ora Ayal

Today the logo appearing at the top of Google.co.il honors Miriam Roth, the author of the famous Hebrew children’s book “A Tale of Five Balloons”. She was born on the 16th of February in 1910.

The Google employee who uploaded the image, made a mistake: the filename is “miriamruth”, but it should be “miriamroth”. That’s what happens when there’s no proper way to write the vowels: Her last name is written רות, which is how the Biblical name “Ruth”, still common in modern Israel, is written. But the German last name “Roth” is written the same way, because in Hebrew “u” and “o” are usually written using the same letter, Vav.

There is a way to differentiate the sounds: רוּת is “Ruth” and רוֹת is “Roth”. Notice the placement of the dot in relation to the letter in the middle. The sign for “u” is called shuruk, and the sign for “o” is called holam; i wrote the bulk of the articles about them in Wikipedia. Most people don’t type these signs; usually it’s fairly easy to guess the correct pronunciation, but people don’t use these signs even when it’s needed, as is the case with Ruth/Roth, because typing them on the standard Hebrew keyboard is very hard.

For years this made me very angry, so i asked the Standards Institute of Israel to develop a new standard keyboard in which it will be easy to type these signs. I was successful at convincing the SII to do it. The work is now underway, and i actively participate in the monthly meetings, together with representatives from Hamakor – the Israeli association for free and open source software, Israel Internet Association, IBM, Microsoft, Apple, Google and other companies. I hope that the standard will be published in 2011; the technical implementation of the keyboard layout will take about ten minutes on each operating system, and shortly after that, i hope, it will be distributed to computers using some kind of an auto-update mechanism.

And then, i hope, we’ll start to see at least slightly richer Hebrew typography everywhere. I want it to happen, not just because it’s a nice tradition, but because this will simply make Hebrew easier to read – and will prevent silly mistakes, like pronouncing and writing “Ruth” instead of “Roth”.


See also: Maqaf.

Unbearable Lightness

I was invited to the 10th anniversary celebration of the Catalan Wikipedia in Perpignan. Perpignan is a city in France, but from the Catalan point of view, it’s in Northern Catalonia – a rather large territory, also known as Roussillon, that was a part of Catalonia, but passed under French rule in 1659. Catalan is still spoken by many people there; how many exactly – i’ll have to see. I hope that it’s spoken by many people for a purely practical reason – my Catalan is much better than my French.

The Catalan Wikipedia is one of the first two Wikipedias created after the English one. The English Wikipedia was created on the 15th of January 2001; German and Catalan were created on the 16th of March 2001. Catalans love to tell that although their Wikipedia was created a few minutes after the German, it was the first one to have an actual article.

Since the Catalan Wikipedia is the oldest and the largest version of Wikipedia in a language which isn’t official in any big country (sorry, Andorra), the people behind it want to share their experiences promoting their language with other regional and minorized languages and this will be discussed in the event. More details on that later.


Direct El-Al flight from Tel-Aviv to Barcelona – 582 USD. Alitalia via Rome, 2 hours wait for connection – 460 USD. Czech Airlines (ČSA) via Prague, 11 hours wait for connection – 367 USD. Guess which one i picked. ČSA, of course – i pay less and i get to spend a day in Prague! Sorry, El-Al.

If you call Czech Airlines office in Tel-Aviv, you can choose one of the following languages, in that order: English, Russian, German, Czech, French, Spanish, Italian. No Hebrew or Arabic. Except that, however, the service is excellent. I spoke in Russian with the service people and they were very polite, helpful and efficient. They were Czech; They spoke Russian with a slight accent, but it was completely correct and easy to understand. I’ll have to wait for the flight itself to see how it is, but until now my impression is very good.


P.S. Typing the word “Czech” is surprisingly hard.

Componenta

Israeli programmers use many words of English origin when they speak Hebrew. (Many of them prefer to write only in English instead of Hebrew, which is a separate issue.)

When they use these English words, they tend to adapt them to Hebrew pronunciation. Some adaptations are simple, for example “router” is pronounced with an Israeli, rather than English [r] sound (some people – not necessarily purists! – use the Hebrew word נַתָּב [natav] for that). “SQL” is rarely pronounced as “sequel” – usually it’s “ess cue el”, and the same goes for MySQL.

But some are harder to explain. For example, “component” is often pronounced [kompoˈnenta]. I heard it in several companies and i don’t quite understand why. Note the [a] in the end and the stress, too: in English it’s supposed to be something in the area of [kʌmˈpoʊnənt] – on the second syllable, not the third. I have never heard an Israeli programmer pronounce it with correct stress when speaking in English – i always hear it as [ˈkomponənt] – with stress on the first syllable and with a [o]‘s in the first two syllables.

The only languages available on Google Translate in which this word is anywhere near [komponénta] are Serbian (компонента), German (Komponente), Romanian (componentă) and Spanish and Italian (componente). It may have something to do with them, but the solution is probably more complicated. Does anyone have any idea?

Number in the Middle

I didn’t measure it, but i probably search Google in English more often than in Hebrew. Under the result link there’s a short summary of the page. Very frequently the first thing that is written in this summary is a date. Google forces right-to-left too strongly on all of the page, so the first number of the date goes to the other end of the summary:

Google search results - right to left

Google search results - right to left

The result is that very, very often i see things like “at most restaurants in 21 Lima and Cusco” and “What if 26 you buy a shite gun”, which doesn’t make sense.

These are the results in complete left-to-right display:

Google search results - left to right

Google search results - left to right

Dear Google, please fix this bug. It’s annoying me for a long time.

Japanese, Germans and Israelis of the world

Through i-iter i came upon this interesting post: Tamil, Kannada and the middle path. Tamil and Kannada are two important languages spoken in the south of India and their speakers are quite proud of their identity.

The article complains that not enough is being done for the linguistic normalization of non-Hindi languages in India. It was very interesting to read it and, being Israeli, i was surprised to see the compliments to “Japanese, Germans and Israelis of the world who aren’t wasting time tom-toming about antiquity, beauty or originality, but are instead investing their time, money and energy in using their languages for almost all known purposes”.

I was curious – why did they choose these three? Why not Russians and French, who use their languages for everything because many of them openly consider them to be better than all the others? Why not Catalans, whose language is in a political situation which is much more similar to that of Tamil and Kannada?

And why Israelis? Sure, we use Hebrew a lot; Hebrew Wikipedia, for example, is our pride. But i don’t think that we use Hebrew enough. For example, a lot of people (not all) write email in English. They write email in English even if they don’t know English well. They write email in English even though practically all the technical problems with encoding and bi-directionality were solved years ago. And they write email in English even if the email is about a topic for which Hebrew is perfectly suitable: one could argue that English is more convenient for writing about software or physics, but quite a lot of people write email in English just to to tell recent family news or to make an appointment.

I used to do that, too, but i made a conscious decision to stop writing email in English unless it is absolutely necessary. I tell all my friends about it. Some of them are indifferent and some of them – especially those in the software industry – say that Israel should have adopted English and not Hebrew as its language. Shame on them. Students think that i know English well, so they often ask me what is the most polite way to make an appointment with their professors in English, and i always tell them: “If your professor can read Hebrew, just write the email in Hebrew!”

Of course, there’s also the matter of university papers. In physics, for example, even though Hebrew is used in classroom, it goes for granted that papers at M.A.-level and higher are written only in English. The need for an English version is understandable, because in the world scale very few people would be able to read a paper in Hebrew, but i would imagine that it’s much better to write the paper in Hebrew and translate it. Yes, it would take time and probably money, but it is nevertheless useful and not just for the honor of the Hebrew language: it would actually advance science and education, because this way people would express themselves in their own language and think about physics instead of thinking about English.

Finally, there’s Facebook. For some reason many Israelis still use Facebook with the English interface – again, even though they don’t know English well, and even though they never read or write anything in English there. The translation of Facebook into Hebrew is terrible, and what’s especially frustrating is that i would gladly fix it, but i can’t, because the interface for submitting translation corrections is absolutely unusable. I nevertheless use Facebook in Hebrew, because it solves the bi-directionality problems – for example, the notorious problem with the punctuation marks appearing at the wrong end of the sentence. There was a newspaper report saying that Facebook influences Israeli children so much that they got used to writing the question mark at the beginning of the sentence – and that’s how they submit their homework! Some Israelis develop weird tricks to make the punctuation appear on the correct side of the sentence, for example by adding a letter after the period – compare “אתה בא לכדורגל בערב?י” and “אתה בא לכדורגל בערב?” – notice the placement of the question mark and the redundant letter in the first sentence. But they could simply switch to Hebrew. (And one day i will write an email to Facebook offices and tell them that they really should improve the translation.)

It’s quite pleasing to see that speakers of Kannada look up to us, but it doesn’t mean that we already did all we could to normalize Hebrew.

(And why am i writing this in English? Because i started writing it as a comment for that blog and it grew into a post by itself.)

Baqlawa

I helped an Arab student who does not know Hebrew well and who is not computer-savvy to find a book in the Mount Scopus library.

The book was in Arabic and she did not know how to search in Arabic in “Aleph“, the library’s search system. In the library computer it was possible to type in Arabic, but the letters were not printed on the keys, so i took out my laptop and opened the Arabic keyboard map. We sat together, and slowly typed the Arabic names (apparently the al- article shouldn’t be typed.). At the end we found the book.

That was yesterday. Today she brought me Baqlawa to the class.

… But they say there is a war between us.

Box

A friend wrote me an email in Hebrew with a technical question about Google Box.

Google Box? That’s a Google service that i haven’t heard about. I heard about I’m feeling lucky, Site search, GMail, Maps, Product Search, Scholar, Buzz, Books…

Oh, Books. (If you’re into general linguistics, you may call it “scanning the paradigmatic axis in slow motion”.)

Hebrew has several spelling standards. None of which is actually used consistently by the general public. The root cause of the confusion is that the Hebrew alphabet only has consonant letters and the vowels are marked by a set of separate signs called “vowel points” or “niqqud” (also spelled nikud etc.; transliteration of Hebrew is also very inconsistent in actual practice). The vowel points are rarely written at all. It doesn’t mean, however, that the vowels aren’t written at all. Some consonant letters are used as vowels, albeit in a rather peculiar way.

For the sake of simplicity i’ll just say that in the most common type of spelling the vowels /u/ and /o/ are both spelled with the letter vav (also called waw). The same letter also marks the consonant /v/, but more often it is a vowel. With the help of rather wondrous intuition most Israelis, when reading, understand whether it is /u/, /o/ or /v/ according to the word, without giving it much thought. It becomes problematic when foreign words need to be transliterated into Hebrew: The English words “box” and “books” are transliterated as, more or less, “bwqs”.

It is possible to discriminate between the two, by using a point from the niqqud system: A point inside the letter vav means that it is to be read as /u/, and a point above it means that it is to be read as /o/. There is no way to type it on the common Hebrew keyboard, however. Or, more precisely, there is a way, but the key combination is very tricky and there’s no drawing on the keyboard that hints at it, so most Israelis don’t know that it is possible. And when you don’t know that it’s possible, it’s as good as impossible.

I’d like to change that. I’d like to make the vowel points available to the general public using computers. So that not only professional book editors will be able to use them, but everyone. So i’m working with the Israeli Institute of Standards to revise the standard keyboard.

Until i’m done, try reading the Wikipedia articles Holam and Kubutz and Shuruk.



Follow

Get every new post delivered to your Inbox.

Join 1,391 other followers