Part III. Language

In this section, Perl demonstrates what makes it the language of choice for manipulating language, with fifteen articles covering everything from state-of-the-art research in natural language processing and speech synthesis to practical problems like formatting text and matching names.

Natural language processing—getting a computer to understand human language—is one of those fields that seems easy at first but is actually fraught with difficulties. NLP textbooks often demonstrate the perversity of English with sentences like “Colorless green ideas sleep furiously,” which is grammatical but nonsensical; “The horse raced past the barn left,” which seems ungrammatical but isn’t; and “Time flies like an arrow,” which is perfectly good English but has four competing interpretations.

The section begins with two articles about programs that converse: John Nolan’s article on a bot that dispenses psychiatric advice, and Kevin Lenzo’s article on the purl bot, which helps out Perl novices on Internet Relay Chat. The ever-prodigious Kevin follows up with another of the research areas that he pursues at Carnegie Mellon: open source speech synthesis in Perl.Next, Prof.Damian Conway shows you how to format text automatically with Text::Autoformat, which manipulates the indentation, quoting, bulleting, and margins of text.

Linguist Sean Burke has six articles in this book—more than anyone else—all of them about language in one form or another. The next two articles, on music and Braille, demonstrate “little languages” constructed for a specific purpose.

NLP hacker Dan Brian follows with two articles on using Perl to give your computer programs an understanding of English. The first article is about his Lingua::Wordnet module, which gives your programs the ability to use relationships between words—to know which are synonyms and antonyms, which are subsets and supersets, and so on. Dan’s second article is about the Lingua::LinkParser module, which provides a Perl interface to the most popular natural language system available. Prof.Khurshid Ahmad and Duncan White follow with an article on using morphology—the structure of words—to begin with a word (e.g., “compute”) and generate related words from it (“computes,” “computing,” “computationally,” and so on).

Next up is Brian Lalonde, who dissects the tricky problem of matching variations on human names. This is not as easy as it sounds: “Bill Gates” is the same person as “William Gates III”, and someone named “Peggy” can also go by “Margaret.” Simple regular expressions won’t suffice; you need a little intelligence to reliably match names.

Sean Burke returns to help you ready your programs for the 5.7 billion people who don’t speak English as a first language, with articles on localization and internationalization. He follows with an article on simulating typos, comparing the standard QWERTY keyboard to the Dvorak keyboard, with a brief excursion into Dutch, Italian, and Tibetan. Dave Cross continues with an article on how to correct typos in subroutine names. Even if you don’t mistype subroutine names frequently, every Perl coder should be aware of the AUTOLOAD trick he uses to intercept nonexistent subroutines. Finally, Tuomas Lukka concludes the article with a description of how he learned Japanese via his program, which automatically translates Japanese into English as he surfs the Web.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset