Marriage in Dictionaries

The definition of marriage is the hottest topic in US news lately.

My favorite place for looking up definitions of English words is, unsurprisingly, the Merriam-Webster dictionary.

And indeed, the editors of M-W’s website noticed the public interest in the definition of marriage, and here’s what they had to write about it:

The word became the subject of renewed scrutiny as the Supreme Court heard arguments in cases seeking to overturn California’s ban on gay marriage and the federal government’s Defense of Marriage Act.

Marriage has become a controversial definition, although its original sense – “the state of being united to a person of the opposite sex” – has not changed.

However, because the word is used in phrases such as “same-sex marriage” and “gay marriage” (by proponents and opponents alike), a second definition – “the state of being united to a person of the same sex in a relationship like that of a traditional marriage” – was added to the dictionary to provide an accurate picture of the word’s current use.

I recently read Herbert Morton’s excellent book The Story of Webster’s Third: Philip Gove’s Controversial Dictionary and Its Critics. It’s excellent because it’s very well written and because it could be a handbook in how to make dictionaries in general: how to balance scientific linguistic precision with usefulness to the general public.

Sadly, this remark about the definition of marriage is a departure from the principles of excellence that guided the editors of Webster’s Third. If the sentence says “same-sex marriage”, then “same-sex” means, literally, “same-sex”; there’s no need to say “the state of being united to a person of the same sex“.

Why not just say that “marriage” is “the state of being united to a person”? Maybe “legally united”, or “religiously united”. Or “united in a family”. It neatly avoids the political problems around sex and gender and all that, and is correct linguistically.

The official dictionary of the Catalan language already did it:

Comparison of two versions of a dictionary definition.
Comparison of two versions of a dictionary definition in the Catalan language.

The Institute of Catalan Studies, which publishes the dictionary, also publishes a list of updates in each edition. In this image you can see how the definition of marriage changed from “a legal union of a man and a woman” to “a legitimate union of two people who promise each other a common life, established through certain rituals or legal formalities”. The last usage example also says: “In some countries the legislation provides for marriage between two persons of the same sex”.

And well, yes, before you ask: of course there is a political background. Catalonia was one of the first jurisdictions that made same-sex marriage equal to different-sex marriage. But from the purely linguistic point of view the newer definition, which doesn’t mention a man and a woman, is perfectly correct. And saying that the definition of “marriage” is different in “marriage” and in “same-sex marriage” is not correct. Simple, really.

Differences Between Things

The search box in Wikipedia suggests auto-completion when you start typing. For example, if you type “je” in the English Wikipedia search box, you’ll get the suggestions “Jews”, “Jewish”, “Jerusalem”, “Jesus”. (Jews kick ass!)

Jews Kick Ass. Henry Winkler, Albert Einstein, Sammy Davis Jr., Jesus, William Shatner, Bob Dylan

If you search for “differences between”, you’ll get this list:

auto-suggestions at Wikipedia for "differences between"

The top spot belongs to “Differences between editions of Dungeons & Dragons” and that shouldn’t be surprising: the article “List of Advanced Dungeons & Dragons 2nd edition monsters” only recently lost its first place in the list of the longest English Wikipedia articles by number of bytes to “‎2011 ITF Men’s Circuit” (it’s something in tennis).

Out of ten suggestions, six are related to languages. American and British English are considered one language, but everybody admits that it has many variations by pronunciation, spelling, vocabulary and many other parameters, and lots of people love to bicker about the spelling of “meter” and “aluminum”. Bosnian, Croatian and Serbian are one language that has different names for reasons that are more political than linguistic. Something similar can probably be said about Malaysian and Indonesian, Norwegian Bokmål and Standard Danish and Scottish Gaelic and Irish, but i know very little about these pairs.

Spanish and Portuguese are related, but definitely separate and mostly mutually unintelligible languages. It’s been said that it is easier for Portuguese speakers to understand Spanish speakers than the other way around, which is interesting, but it doesn’t really justify an encyclopedic article, as in the other cases. In fact, i am somewhat surprised that “Differences between Brazilian and European Portuguese dialects” is not in the list, given the huge number of arguments about it in the Portuguese – sorry, Lusophone – Wikipedia.

“Butterflies and moths” is probably the most serious article in this list, but that’s probably because i’m not a Biologist.

And the last two articles are about movies (James Bond – movies vs. novels) and religion (Codex Sinaiticus vs. Vaticanus), which is also very Wikipedia, the encyclopedia about which someone said that it has more stamp collectors than good writers. (Citation needed; I can’t find the original quote.)

Unbearable Lightness

I was invited to the 10th anniversary celebration of the Catalan Wikipedia in Perpignan. Perpignan is a city in France, but from the Catalan point of view, it’s in Northern Catalonia – a rather large territory, also known as Roussillon, that was a part of Catalonia, but passed under French rule in 1659. Catalan is still spoken by many people there; how many exactly – i’ll have to see. I hope that it’s spoken by many people for a purely practical reason – my Catalan is much better than my French.

The Catalan Wikipedia is one of the first two Wikipedias created after the English one. The English Wikipedia was created on the 15th of January 2001; German and Catalan were created on the 16th of March 2001. Catalans love to tell that although their Wikipedia was created a few minutes after the German, it was the first one to have an actual article.

Since the Catalan Wikipedia is the oldest and the largest version of Wikipedia in a language which isn’t official in any big country (sorry, Andorra), the people behind it want to share their experiences promoting their language with other regional and minorized languages and this will be discussed in the event. More details on that later.

Direct El-Al flight from Tel-Aviv to Barcelona – 582 USD. Alitalia via Rome, 2 hours wait for connection – 460 USD. Czech Airlines (ČSA) via Prague, 11 hours wait for connection – 367 USD. Guess which one i picked. ČSA, of course – i pay less and i get to spend a day in Prague! Sorry, El-Al.

If you call Czech Airlines office in Tel-Aviv, you can choose one of the following languages, in that order: English, Russian, German, Czech, French, Spanish, Italian. No Hebrew or Arabic. Except that, however, the service is excellent. I spoke in Russian with the service people and they were very polite, helpful and efficient. They were Czech; They spoke Russian with a slight accent, but it was completely correct and easy to understand. I’ll have to wait for the flight itself to see how it is, but until now my impression is very good.

P.S. Typing the word “Czech” is surprisingly hard.

Who is Albert Sánchez Piñol?

Who is Albert Sánchez Piñol? Let’s look at Wikipedias in different languages, translated into English, ordered by the English name of the language:

Basque: Albert Sánchez Piñol is a Catalan writer and anthropologist.

Catalan: Albert Sánchez Piñol is a Catalan anthropologist and writer who wrote the known works “The Cold Skin” (2002) and “Pandora in Congo” (2005).

Dutch: Albert Sánchez Piñol is a Spanish anthropologist and employee of the Center for African Studies of the University of Barcelona. (The rest of the article describes his work in the field of anthropology. The last sentence says that he writes in Catalan.)

English: Albert Sánchez Piñol (Catalan pronunciation: [əɫˈβɛrt ˈsantʃeθ piˈɲɔɫ]) is a Catalan Spanish author and anthropologist writing in the Catalan language.

German: Albert Sánchez Piñol is a Spanish anthropologist and writer. (Catalan is not mentioned in the article, but the article is included in the category “Literature (Catalan)”).

Italian: Albert Sánchez Piñol is a Spanish writer and anthropologist. (The fact that “The Cold Skin” was written in Catalan is mentioned towards the end.)

Norwegian: Albert Sánchez Piñol is a Spanish author and social anthropologist, writing in Catalan.

Polish: Albert Sánchez Piñol, a Spanish writer, a prosaist writing in the Catalan language. By education he is an anthropologist.

Russian: Albert Sánchez Piñol – a Catalan anthropologist and writer.

Spanish: Albert Sánchez Piñol is a Spanish writer and anthropologist. His literary work is written in Catalan.

(All articles say that he was born in Barcelona in 1965. Only English has an IPA transcription of the name, although it’s probably wrong.)

Japanese, Germans and Israelis of the world

Through i-iter i came upon this interesting post: Tamil, Kannada and the middle path. Tamil and Kannada are two important languages spoken in the south of India and their speakers are quite proud of their identity.

The article complains that not enough is being done for the linguistic normalization of non-Hindi languages in India. It was very interesting to read it and, being Israeli, i was surprised to see the compliments to “Japanese, Germans and Israelis of the world who aren’t wasting time tom-toming about antiquity, beauty or originality, but are instead investing their time, money and energy in using their languages for almost all known purposes”.

I was curious – why did they choose these three? Why not Russians and French, who use their languages for everything because many of them openly consider them to be better than all the others? Why not Catalans, whose language is in a political situation which is much more similar to that of Tamil and Kannada?

And why Israelis? Sure, we use Hebrew a lot; Hebrew Wikipedia, for example, is our pride. But i don’t think that we use Hebrew enough. For example, a lot of people (not all) write email in English. They write email in English even if they don’t know English well. They write email in English even though practically all the technical problems with encoding and bi-directionality were solved years ago. And they write email in English even if the email is about a topic for which Hebrew is perfectly suitable: one could argue that English is more convenient for writing about software or physics, but quite a lot of people write email in English just to to tell recent family news or to make an appointment.

I used to do that, too, but i made a conscious decision to stop writing email in English unless it is absolutely necessary. I tell all my friends about it. Some of them are indifferent and some of them – especially those in the software industry – say that Israel should have adopted English and not Hebrew as its language. Shame on them. Students think that i know English well, so they often ask me what is the most polite way to make an appointment with their professors in English, and i always tell them: “If your professor can read Hebrew, just write the email in Hebrew!”

Of course, there’s also the matter of university papers. In physics, for example, even though Hebrew is used in classroom, it goes for granted that papers at M.A.-level and higher are written only in English. The need for an English version is understandable, because in the world scale very few people would be able to read a paper in Hebrew, but i would imagine that it’s much better to write the paper in Hebrew and translate it. Yes, it would take time and probably money, but it is nevertheless useful and not just for the honor of the Hebrew language: it would actually advance science and education, because this way people would express themselves in their own language and think about physics instead of thinking about English.

Finally, there’s Facebook. For some reason many Israelis still use Facebook with the English interface – again, even though they don’t know English well, and even though they never read or write anything in English there. The translation of Facebook into Hebrew is terrible, and what’s especially frustrating is that i would gladly fix it, but i can’t, because the interface for submitting translation corrections is absolutely unusable. I nevertheless use Facebook in Hebrew, because it solves the bi-directionality problems – for example, the notorious problem with the punctuation marks appearing at the wrong end of the sentence. There was a newspaper report saying that Facebook influences Israeli children so much that they got used to writing the question mark at the beginning of the sentence – and that’s how they submit their homework! Some Israelis develop weird tricks to make the punctuation appear on the correct side of the sentence, for example by adding a letter after the period – compare “אתה בא לכדורגל בערב?י” and “אתה בא לכדורגל בערב?” – notice the placement of the question mark and the redundant letter in the first sentence. But they could simply switch to Hebrew. (And one day i will write an email to Facebook offices and tell them that they really should improve the translation.)

It’s quite pleasing to see that speakers of Kannada look up to us, but it doesn’t mean that we already did all we could to normalize Hebrew.

(And why am i writing this in English? Because i started writing it as a comment for that blog and it grew into a post by itself.)


1998: I was working on the final project in the programming course. We were a team of seven people. Thanks to my famous Microsoft Word prowess i was in charge of writing the documents that were part of the project, but the other team members also had to update them and it was quite troublesome. So i told my friend El’ad an idea i had: “How nice would be it be if i could collaborate with my team members – if we could write the same document simultaneously. It would be a nice startup!” El’ad told me that it seemed rather useless to him.

Some time later El’ad told me about his own idea for a startup: “Let’s say that you have some files on your computer, for example music or images, and these files may be interesting to other people on the web that you don’t even know, and you want to share them and help people find them…”

To which i replied: “Who on Earth would want to do such a thing? That’s what websites and FTP are for.”

A few months later all the websites were buzzing about Napster’s fucking up the music business and El’ad told me that they implemented that idea of his.

2007: I went to Catalonia for a week and didn’t go online for all that time. When i came back, all the websites were buzzing about Radiohead’s fucking up the music business further with “In Rainbows”.

2009: I haven’t used the web since Thursday morning. Today i went online and every website was buzzing about Google Wave.

Google Wave is a combination of a word processor, an email program and an instant messaging program that is written in HTML. We’ll have to wait and see whether this will fuck up Microsoft’s business model, but the important parts for me are that it has a very cool spell checker, and more importantly – that it allows several people to edit the same document simultaneously.

Take a look at the video: Google Wave Developer Preview at Google I/O 2009. At 00:35 you’ll see exactly the thing i envisioned in 1998. It even has Hebrew there.

So, El’ad, you can say that you had your revenge on me. But i’m still quite proud – i envisioned an idea that took many more years to implement.

Fighting Antisemitism

I helped two nice Italian tourists find their way in Jerusalem today. They knew English, but how could i miss an opportunity to practice my Italian? I barely touched any Italian for two years, so i spoke slowly, but managed to say complete sentences and didn’t mix in any Catalan words. They were pleasantly surprised, of course, and said that my Italian pronunciation was correct.

Now there’s a little less antisemitism in the world. But not just because of my Italian skills, but because the bus they needed to take arrived quickly, which, for Israel, is a miracle. So, Egged: Fight antisemitism, improve the Israeli bus services!

Obnoxious Firefox Licensing

Mozilla Firefox comes in many localized versions for many different languages, which is a good thing.

Mozilla Firefox has built-in spell-checking, which is also a good thing.

So, for example, if you download the installer for English (US) or for Lithuanian and install it and go write an email in GMail or edit a Wikipedia article in one of these languages, you’ll immediately see your spelling errors. This makes perfect sense.

But if you download an installer localized for English (UK), Catalan or Hebrew, you won’t see your spelling errors. The Firefox binary has spell-checking capabilities, but the installer doesn’t include the actual dictionary. Firefox-compatible dictionaries for these languages exist, and they are licensed as Free Software (GPL or LGPL), and you can add them manually after installing (right-click -> Languages -> Add Dictionaries), but here comes the ridiculous part: The guys behind refuse to include those dictionaries in the installer. The reason, apparently, is that to be included in the installer, the dictionary must be 300% compatible with Firefox’s license, because Firefox is tri-licensed as GPL/LGPL/MPL, and a dictionary that is GPL-only is not good enough.

It is hard enough to convince people to install Firefox in the first place; convincing them to install additional dictionaries, plug-ins, add-ons etc. tends to frustrate them even more. Contrary to the belief which is popular among Firefox power users, most people are not add-on junkies and don’t right-click everywhere. So, even though Firefox users in London, Barcelona and Jerusalem can see Firefox menus in their respective languages, they have dead-weight spell-checking code on their hard drives, because they didn’t get a spelling dictionary in the installation, and many of them don’t even know that a Firefox-compatible spelling dictionary for their language exists.

Is this obnoxious licensing requirement really required? Isn’t Free Software licensing supposed to make distributing software easier?

When i told my wife Hadar about it, she said that it is as ridiculous as the stuff i tell her about DRM.

See also:


YouTube may be a competitor to Wikipedia as one of the most massively multilingual sites on the web.

Many people who comment there don’t seem to care that English is the lingua franca of the web. They just write in Russian, Portuguese, Indonesian, Catalan and Croatian and it creates a soup of languages. And that is a Very Good Thing. It makes languages seen and promotes tolerance. Variety and tolerance are mighty good.