Chapter 3. Morphology – Getting Our Feet Wet

Morphology may be defined as the study of the composition of words using morphemes. A morpheme is the smallest unit of language that has meaning. In this chapter, we will discuss stemming and lemmatizing, stemmer and lemmatizer for non-English languages, developing a morphological analyzer and morphological generator using machine learning tools, search engines, and many such concepts.

In brief, this chapter will include the following topics:

  • Introducing morphology
  • Understanding stemmer
  • Understanding lemmatization
  • Developing a stemmer for non-English languages
  • Morphological analyzer
  • Morphological generator
  • Search engine

Introducing morphology

Morphology may be defined as the study of the production of tokens with the help of morphemes. A morpheme is the basic unit of language carrying meaning. There are two types of morpheme: stems and affixes (suffixes, prefixes, infixes, and circumfixes).

Stems are also referred to as free morphemes, since they can even exist without adding affixes. Affixes are referred to as bound morphemes, since they cannot exist in a free form and they always exist along with free morphemes. Consider the word unbelievable. Here, believe is a stem or a free morpheme. It can exist on its own. The morphemes un and able are affixes or bound morphemes. They cannot exist in a free form, but they exist together with stem. There are three kinds of language, namely isolating languages, agglutinative languages, and inflecting languages. Morphology has a different meaning in all these languages. Isolating languages are those languages in which words are merely free morphemes and they do not carry any tense (past, present, and future) and number (singular or plural) information. Mandarin Chinese is an example of an isolating language. Agglutinative languages are those in which small words combine together to convey compound information. Turkish is an example of an agglutinative language. Inflecting languages are those in which words are broken down into simpler units, but all these simpler units exhibit different meanings. Latin is an example of an inflecting language. Morphological processes are of the following types: inflection, derivation, semiaffixes and combining forms, and cliticization. Inflection means transforming the word into a form so that it represents person, number, tense, gender, case, aspect, and mood. Here, the syntactic category of a token remains the same. In derivation, the syntactic category of a word is also changed. Semiaffixes are bound morphemes that exhibit words, such as quality, for example, noteworthy, antisocial, anticlockwise, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset