I did a talk at Jerusalem Perl Mongers meeting.
I showed a little script that i wrote two years ago when i wrote a paper about Lithuanian grammar. The script searches a text for uses of the nearly extinct Illative case. In the paper i tried to understand why do authors still use even though there are other ways to describe motion in Lithuanian. The paper was surprisingly well-received back then. (See also discussion about it in Debesėlis forum.)
Anyway, the script is not too clever in any special way. Scanning a text for words with illative endings is almost the same as scanning a log on a server for errors. (I once heard a co-worker using the phrase “perling through the logs”. He said that it’s his own invention!) Except that human language is much more complicated.
Now i say that clever analysis of a text in a natural human language is not impossible. I am thinking of a unified way to translate grammar books and dictionaries to machine-readable representation – to describe English, Lithuanian and Hebrew in the same way. I tried to search for other projects that did something like that, but found only theories without implementations or implementations which were too language-specific (Hebrew-only, English-only, etc.) After thinking about it for a few weeks i also started to truly understand a lot of concepts that i
learned have been taught in the five years of studying for a Linguistics degree.
With modern tools such as Google, Perl 6 grammars, Semantic Web-related technologies, and some older ones, such as Prolog, building such a system seems technically feasible, just a question of man-years. At the Perl Mongers one of the participants also turned my attention to WordNet, which is mostly English-specific, but seems promising.
A lot of my friends want to establish a start-up company. If i shall ever do that, that’s what my company would do.