Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Affix tagging

The AffixTagger class is another ContextTagger subclass, but this time the context is either the prefix or the suffix of a word. This means the AffixTagger class is able to learn tags based on fixed-length substrings of the beginning or ending of a word.

How to do it...

The default arguments for an AffixTagger class specify three-character suffixes, and that words must be at least five characters long. If a word is less than five characters, then None is returned as the tag.

>>> from nltk.tag import AffixTagger
>>> tagger = AffixTagger(train_sents)
>>> tagger.evaluate(test_sents)
0.27558817181092166

So, it does ok by itself with the default arguments. Let's try it by specifying three-character prefixes.

>>> prefix_tagger = AffixTagger(train_sents, affix_length=3)
>>> prefix_tagger.evaluate(test_sents)
0.23587308439456076

To learn on two-character suffixes, the code will look like this:

>>> suffix_tagger = AffixTagger(train_sents, affix_length=-2)
>>> suffix_tagger.evaluate(test_sents)
0.31940427368875457

How it works...

A positive value for affix_length means that the AffixTagger class will learn word prefixes, essentially word[:affix_length]. If affix_length is negative, then suffixes are learned using word[affix_length:].

There's more...

You can combine multiple affix taggers in a backoff chain if you want to learn on multiple character length affixes. Here's an example of four AffixTagger classes learning on 2 and 3 character prefixes and suffixes:

>>> pre3_tagger = AffixTagger(train_sents, affix_length=3)
>>> pre3_tagger.evaluate(test_sents)
0.23587308439456076
>>> pre2_tagger = AffixTagger(train_sents, affix_length=2, backoff=pre3_tagger)
>>> pre2_tagger.evaluate(test_sents)
0.29786315562270665
>>> suf2_tagger = AffixTagger(train_sents, affix_length=-2, backoff=pre2_tagger)
>>> suf2_tagger.evaluate(test_sents)
0.32467083962875026
>>> suf3_tagger = AffixTagger(train_sents, affix_length=-3, backoff=suf2_tagger)
>>> suf3_tagger.evaluate(test_sents)
0.3590761925318368

As you can see, the accuracy goes up each time.

Note

The ordering in the previous block of code is not the best, nor is it the worst. I'll leave it to you to explore the possibilities and discover the best backoff chain of values for AffixTagger and affix_length.

Working with min_stem_length

The AffixTagger class also takes a min_stem_length keyword argument, with a default value of 2. If the word length is less than min_stem_length plus the absolute value of affix_length, then None is returned by the context() method. Increasing min_stem_length forces the AffixTagger class to only learn on longer words, while decreasing min_stem_length will allow it to learn on shorter words. Of course, for shorter words, the affix_length argument could be equal to or greater than the word length, and AffixTagger would essentially be acting like a UnigramTagger class.

Table of Contents for
Affix tagging

Affix tagging

How to do it...

How it works...

There's more...

Note

Working with min_stem_length

See also

Table of Contents for Affix tagging

Create new playlist

Sign In

Sign Up

Affix tagging

How to do it...

How it works...

There's more...

Note

Working with min_stem_length

See also

Table of Contents for
Affix tagging