Aharoni in Unicode, ya mama

Treacle tarts for great justice

Archive for the ‘linguistics’ Category

How do you look up words in a Hebrew dictionary?

Posted by aharoni on 2009-10-07

How do you look up words in a Hebrew dictionary?

For this post i would like to get as many comments as possible. If you are more comfortable reading or writing in Russian or in Hebrew, please see:

What is difficult for you?

Is it difficult to find the root of the word? (This is relevant mostly for verbs, but in some dictionaries also for nouns.) How do you prefer to search for verbs – by the root, by the infinitive, by the past (perfect) tense, by the present (participle) tense?

Is it hard for you to separate the prefixes (conjunctions, prepositions) and the suffixes (tense, possession)?

Do you have any trouble reading Hebrew with or without vowel points (niqqud)? Do you need transcription in easy-to-read Latin characters or in IPA?

Do you understand abbreviations such as vt, n.pr.m., adv., impv., זו”נ‎, פעו”י‎, מ”ג‎, נ”ר? Do you notice them at all? Do they bother you in any way?

Do you remember any words that were particularly hard to find? Words or expressions, in order to find which you had to open several dictionaries? Words that you couldn’t find at all, anywhere?

Do you have any particular problems with the usage of the letters א‎, ו‎, י for vowels? If you can’t find the word תוכנה, do you know that you should try searching for תכנה? Is there a dictionary that you prefer, because it has a system for the usage of these letters that you like?

Do you have a preferred dictionary in general or a dictionary that you don’t like? Why? I am talking about mono- and bi-lingual ones, and about printed and electronic: Even-Shoshan, Ben-Yehuda, Gur, Ariel, BDB, Rav-Millim, Alkalai, Sapir, Ha-hove, Morfix etc.

These questions may seem a bit generic, but i am curious mostly about the aspect of using the dictionary and not general language difficulties.

Please write whatever comes to your mind, even if you think that it is embarrassing or too simple. Feel free to answer anonymously or to email me at amir.aharoni@mail.huji.ac.il.

Many, many thanks in advance.

Posted in language, lexicography, linguistics | Leave a Comment »

Mtmihim

Posted by aharoni on 2009-08-27

Google’s PR people Mtmihim me.

Every few days, Google tells the world about the wonderful free translation tools it offers, but the problem is that they completely Mafnim, because they can not translate one sentence correctly between each pair of languages.

Why do they think the public will buy the lie that stupid?


The above was automatically translated from Hebrew. What i actually meant to write was this:

I’m puzzled by Google’s public relations people.

Every couple of days Google tells the world about the wonderful and free translation tools that it offers, but the problem is that they completely suck, as they aren’t able to translate a single sentence correctly between any pair of languages.

Why do they think that the public will buy this stupid lie?


And i’ve got to admit that this particular translation is not that terrible, but i already made the original somewhat synthetic. Any real world translation is completely useless.

Posted in Google, linguistics | 2 Comments »

Mistyping

Posted by aharoni on 2009-02-05

I saw a Hebrew speaker typing the word “mistypiping” in an email. She meant to type “mistyping”. Unintended contextual humor.

I told her that “typo” is the usual English word. “Mistyping” exists: it appears in Merriam-Webster’s list of words with the mis- prefix and Oxford English Dictionary says that it exists since 1977. But it is obviously rare.

She eventually wrote “typo”, but wasn’t too happy about it. She said that it’s the first time that she sees the word “typo”, and it would be much harder for her to understand it if she received it in an email.

If you love Esperanto, you must be really happy now to be reading this, as this is exactly how Esperanto works, or at least supposed to work: as few roots as possible and as much regularity in prefixes and suffixes as possible.

Posted in English, Esperanto, Hebrew, lexicography, linguistics | Leave a Comment »

Eloquence

Posted by aharoni on 2009-01-11

“The era of Bushisms is now coming to an end, and word watchers worldwide will have a hard time substituting Barack Obama’s precise intonations and eloquence for W’s unique linguistic constructions.” (Paul JJ Payack, GLM.)

Posted in linguistics | Tagged: , | 1 Comment »

Gender Studies

Posted by aharoni on 2008-12-09

In most Hebrew language courses a significant majority of students are female. The only exception is the course “Medieval Hebrew: Piyyut and Spanish Poetry”, which has 70% of male students. Calling this course “the hardest” wouldn’t be very objective, but it is safe to say that the Even-Shoshan Dictionary is not very useful for understanding the texts that we read there.

In Linguistics courses i took the ratio of male-to-female students was pretty much even. The same goes for “Spanish for beginners”.

However, in the “Advanced Portuguese” course all students are male.

(Hi, Jane.)

Posted in Hebrew, Portuguese, Spanish, gender, linguistics, university | Leave a Comment »

Collegiate

Posted by aharoni on 2008-03-24

My number one favorite dictionary of all time is Merriam-Webster’s Collegiate. I often read it for fun. I don’t think that there is a single page in it that i have never opened. Its consistency is uncompromising. Its completeness and attention to detail are remarkable. Its definitions are so precise, that you could take nearly any word in an English text, replace it with the definition from Merriam-Webster’s Collegiate, and the meaning of the text would remain virtually intact.

But! Apparently i didn’t know how to pronounce the word “Collegiate” correctly. I though that it’s \ˈkä-lē-jət\, like “college” – \ˈkä-lij\. Apparently it’s \kə-ˈlē-jət\.

Damn it.


(N.B. In this entry i used M-W’s own pronunciation symbols and not IPA.)

Posted in linguistics | Tagged: , | 2 Comments »

Punctuation With a Smiley

Posted by aharoni on 2007-10-28

… And Artemy Lebedev again. He finally raised an important question – how to write punctuation marks near the smiley emoticon?

His proposal:

  • A smiley is separated from a word with a space – i agree.
  • If there’s a punctuation mark after the smiley, it is not separated from the smiley by a space – i don’t agree. I think that Are you ghey or what? :-) looks better than Are you ghey or what :-)?
  • A period never comes after a smiley – i totally agree :-)
  • A smiley can be combined with a closing brace (in the case that it was opened somewhere :-) – i agree. (Although i used to write like this in the past :) )

People who don’t think that this question is important shouldn’t be allowed to use email.

Posted in Internet, linguistics | 1 Comment »

I to twist only began to teach English

Posted by aharoni on 2007-08-31

Vegetarian spam again. I just can’t stand the beauty of it and i have to share. More importantly: Please! Anyone! Spam experts! There must be software that creates such texts for spamming – what is it called? Where do i get it? The grammar it uses is wrong English, but there seems to be some kind of logic behind it – i want to learn it!

Will there be time to give the pair of answers?recently glanced in the guest book and such saw here gibberish: phentermine fioricet cheapest fioricet and news fioricet phentermine fioricet or buy fioricet online canada fioricet buy whether there are here people which will be able to render support on this not easy business I ask to forgive me for the e construction of suggestions is faithful not always, I to twist only began to teach English, by nationality to twist I Georgian, understand as in schools of my country teach this wonderful from languages :( all while on, time, think in my absence will get new reports
Ps. I found some references which to me interesting steel buy fioricet online canada fioricet buy and searched through google on demand : fioricet buy fioricet effects,? here this site with the heap of Linkov.
what is ?

Posted in English, linguistics | Tagged: | 2 Comments »

here these themes which I fallen back on

Posted by aharoni on 2007-08-09

I’m sorry, i know i promised not to post vegetarian spam, but this one is kinda special.

Send my way in a correct river-bedif who it likes answer want to associate, knock in icq write in email only, it is yet better to discuss in opened right here  here these themes which I fallen back on: pills fioricet news fioricet and news fioricet phentermine fioricet or fioricet fioricet buy whether there are here people which will be able to render support on this not easy business I ask to forgive me for slang, I to twist only began to teach English, by nationality to twist I Romanian, understand as in schools of my country teach this wonderful from languages :( not forget to write your answers here
Ps. I found some references which to me interesting steel generic fioricet fioricet and searched through google on demand : buy fioricet online usa cheap fioricet,? here this site with the heap of Linkov.
what is ?

Isn’t it beautiful? It’s even kinda … relevant?

And here’s another small one for dessert:

Are you aware that no credit can be just as bad as bad credit?

Posted in linguistics, spam | Tagged: | Leave a Comment »

Signify

Posted by aharoni on 2007-08-08

Art. Lebedev did it again: Короче. The title means “Shorter”.

You don’t need to know Russian to understand what he says there. The road sign at the first picture says:

DRIVER! FASTEN YOUR SEATBELTS AND TURN ON THE DRIVING BEAM OF THE HEADLAMPS

The second picture says:

— Lights and seat belt!

Lebedev doesn’t just say that road signs should be shorter. He emphasizes the use of proper typography, which is not just nice, but practical too. In books the dash introduces direct speech, so when the driver sees it, he feels that someone is actually speaking to him and makes him want to do something in response. Proper use of capital and small letters instead of all-capitals makes the sign more easily readable, which is crucially important, ‘cuz you don’t want to make driving harder.

Lebedev doesn’t say much about the exclamation mark, but as a linguist i’d like to add that it is there because it has to be there, because a sentence that starts with a dash just has to end with something. It’s similar to the -es in the sentence “He goes to the bar”: textbooks say that the -es means “third person singular”, but in fact the He is the sign of “third person singular”, and the -es is there simply because the sentence “He go to the bar” would not be considered proper English by most people.

In the USA almost all road signs are just written in English in very short and standard sentences: “SPEED LIMIT”, “STOP”, “FOOD”. It’s not as beautiful as Lebedev’s proposal, but i do think that it is rather practical, because the driver doesn’t need to learn a hundred or so pictograms, like it is in most countries. It has one drawback: The driver has to know English.

Posted in Russian, design, linguistics, transport | Tagged: , | 2 Comments »

The Future

Posted by aharoni on 2007-07-23

Where does the computing world go? I’m not talking just about Free Software, but about the whole industry. Even Microsoft is in trouble here.

What more can we do with computers? What will computers do five years from now that they can’t do today?

Writing documents and university papers can’t get much better than MS-Office, OpenOffice, TeX and DocBook. Each of them caters rather well to their respective markets (except some interoperability issues, which are really rather minor if you put the bizness bullshit aside.)

Music, Movies, Animation? You can’t improve this field much more in the home market, and the high-end market of professional artists and studios is rather narrow. (Although ideas expressed in Lessig’s Free Culture can make it wider …)

Business v1.0 software – databases, billing, CRM, ERP? It is a market of reliability, not innovation.

Websites, communications and social networks? True innovation in that area hit a glass wall long ago, if you ask me. Some websites make up nicer AJAX tricks, but that’s about it.

So i thought that the really innovative thing that can useful on a major scale may lie in the field of Linguistics (disclaimer: I am studying for a B.A. in Linguistics). Speech recognition, text-to-speech and automated translation – all of them are related to Linguistics; none of them can be done right without proper scientific Linguistic preparation.

Microsoft puts “improved” speech recognition into every version of MS-Office, but it is very far from doing it right. Xerox and IBM tried something in their respective (and respected) research labs, but it didn’t see the light of day (at least yet). Google are rumored to be doing something with statistics-based automated translation.

But no-one has anything finalized.

The first one who does it right will rule the whole market for years to come. Of the current players, Google seems to have the best chances to succeed, but it can also be a startup company created by an anonymous undergraduate Liberal Arts student in India, Nigeria or Ukraine. Or Israel?

(Originally published in Bug #1.)

Posted in Free Software, Internet, Microsoft, design, language, linguistics, making the world a better place, society, software, university | Tagged: , , , | 2 Comments »

Mother Tongue

Posted by aharoni on 2007-06-21

HLA says on my new Hebrew blog: “It must be noted that it is much more fun for me to read in Hebrew.”

I’m glad to optimize for fun (PDF file).

But it must be noted, that it is not easier for me to read in Hebrew than it is in English. And it is not easier for me to read in Russian than in Hebrew or in English. I can read these three pretty much equally well. And it’s not necessarily good.

I hardly have a mother tongue.

Russian is probably still the best shot if i have to name my mother tongue. When i made my contribution to English Speech Accent Archive (requires QuickTime for audio), i was classified as a Russian speaker; it was academic, but rather artificial. When i speak Hebrew i sometimes makes funny mistakes, Russianisms; being a linguist i become aware of them, but a moment too late. The most common such mistake must be saying phrases such as “We went with my my friend to a movie.” It usually means “I went with a friend to a movie” – two people. In Russian it is perfectly correct to say it – Мы ходили с другом в кино, but in Hebrew and English it is weird. Occasionally i say “да, я, но” instead of “yes, i, but”.

But then i also have occasional Hebraisms slipping into my Russian and English speech and Anglicisms slipping into Hebrew and Russian.

When i read texts about politics and and news in Russian, it feels differently. I can say that it feels more lively and expressive, but i can’t say that it’s easier.

So i hardly have any mother tongue.

Which is probably not that good.

Posted in English, Hebrew, Russian, blogging, language, linguistics | 3 Comments »

Reality – Alone Alone

Posted by aharoni on 2007-05-16

Assaf Amdursky - Alone Alone together with Karni Postel

Assaf Amdursky – Alone Alone

Together with Karni Postel

Assaf Amdursky is a successful Israeli singer. In the last few months he is doing an intimate solo acoustic tour, which he calls לבד לבד – Alone Alone. For the linguistic-minded readers: Yes – it’s a use of reduplication in Hebrew.

Karni Postel (or maybe Fostel – i’m not sure) is mostly a cello player and also a singer. She is very talented. I once dreamt that we’re making love in a desert.

If you can read Russian, see something very similar at Art. Levedev’s Idioteque.

Posted in Hebrew, Russian, consumership, linguistics, photo, reality | Tagged: | Leave a Comment »

Hard is not Impossible

Posted by aharoni on 2007-05-10

I did a talk at Jerusalem Perl Mongers meeting.

I showed a little script that i wrote two years ago when i wrote a paper about Lithuanian grammar. The script searches a text for uses of the nearly extinct Illative case. In the paper i tried to understand why do authors still use even though there are other ways to describe motion in Lithuanian. The paper was surprisingly well-received back then. (See also discussion about it in Debesėlis forum.)

Anyway, the script is not too clever in any special way. Scanning a text for words with illative endings is almost the same as scanning a log on a server for errors. (I once heard a co-worker using the phrase “perling through the logs”. He said that it’s his own invention!) Except that human language is much more complicated.

Now i say that clever analysis of a text in a natural human language is not impossible. I am thinking of a unified way to translate grammar books and dictionaries to machine-readable representation – to describe English, Lithuanian and Hebrew in the same way. I tried to search for other projects that did something like that, but found only theories without implementations or implementations which were too language-specific (Hebrew-only, English-only, etc.) After thinking about it for a few weeks i also started to truly understand a lot of concepts that i learned have been taught in the five years of studying for a Linguistics degree.

With modern tools such as Google, Perl 6 grammars, Semantic Web-related technologies, and some older ones, such as Prolog, building such a system seems technically feasible, just a question of man-years. At the Perl Mongers one of the participants also turned my attention to WordNet, which is mostly English-specific, but seems promising.

A lot of my friends want to establish a start-up company. If i shall ever do that, that’s what my company would do.

Posted in Perl, linguistics | Tagged: , , | Leave a Comment »

Vodka

Posted by aharoni on 2006-07-18

This just in: Vodka goes very well with ice cream. Makes sense – it’s like a five dollar milkshake.

I’m writing a paper about languages of three peoples that have a long historical argument about the very hard question, “Who invented vodka?” I’m talking about Russian, Belarusian and Lithuanian. I may throw Polish in at some stage, too.

Posted in Belarusian, Russian, alcohol, linguistics | Tagged: , , | Leave a Comment »