Tagging proper names

Using the included names corpus, we can create a simple tagger for tagging names as proper nouns.

How to do it...

The NamesTagger class is a subclass of SequentialBackoffTagger as it's probably only useful near the end of a backoff chain. At initialization, we create a set of all names in the names corpus, lower-casing each name to make lookup easier. Then, we implement the choose_tag() method, which simply checks whether the current word is in the names_set list. If it is, we return the NNP tag (which is the tag for proper nouns). If it isn't, we return None, so the next tagger in the chain can tag the word. The following code can be found in taggers.py:

from nltk.tag import SequentialBackoffTagger
from nltk.corpus import names

class NamesTagger(SequentialBackoffTagger):
  def __init__(self, *args, **kwargs):
    SequentialBackoffTagger.__init__(self, *args, **kwargs)
    self.name_set = set([n.lower() for n in names.words()])

    def choose_tag(self, tokens, index, history):
      word = tokens[index]

      if word.lower() in self.name_set:
        return 'NNP'
      else:
        return None

How it works...

The NamesTagger class should be pretty self-explanatory. The usage is also simple.

>>> from taggers import NamesTagger
>>> nt = NamesTagger()
>>> nt.tag(['Jacob'])
[('Jacob', 'NNP')]

It's probably best to use the NamesTagger class right before a DefaultTagger class, so it's at the end of a backoff chain. But it could probably go anywhere in the chain since it's unlikely to mis-tag a word.

See also

The Combining taggers with backoff tagging recipe goes over the details of using the SequentialBackoffTagger subclasses.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset