Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Part-of-speech Tagging

In this chapter, we will cover the following recipes:

Default tagging
Training a unigram part-of-speech tagger
Combining taggers with backoff tagging
Training and combining ngram taggers
Creating a model of likely word tags
Tagging with regular expressions
Affix tagging
Training a Brill tagger
Training the TnT tagger
Using WordNet for tagging
Tagging proper names
Classifier-based tagging
Training a tagger with NLTK-Trainer

Introduction

Part-of-speech tagging is the process of converting a sentence, in the form of a list of words, into a list of tuples, where each tuple is of the form (word, tag). The tag is a part-of-speech tag, and signifies whether the word is a noun, adjective, verb, and so on.

Part-of-speech tagging is a necessary step before chunking, which is covered in Chapter 5, Extracting Chunks. Without the part-of-speech tags, a chunker cannot know how to extract phrases from a sentence. But with part-of-speech tags, you can tell a chunker how to identify phrases based on tag patterns.

You can also use part-of-speech tags for grammar analysis and word sense disambiguation. For example, the word duck could refer to a bird, or it could be a verb indicating a downward motion. Computers cannot know the difference without additional information, such as part-of-speech tags. For more on word sense disambiguation, refer to the URL https://en.wikipedia.org/wiki/Word_sense_disambiguation.

Most of the taggers we'll cover are trainable. They use a list of tagged sentences as their training data, such as what you get from the tagged_sents() method of a TaggedCorpusReader class (see the Creating a part-of-speech tagged word corpus recipe in Chapter 3, Creating Custom Corpora, for more details). With these training sentences, the tagger generates an internal model that will tell it how to tag a word. Other taggers use external data sources or match word patterns to choose a tag for a word.

All taggers in NLTK are in the nltk.tag package and inherit from the TaggerI base class. TaggerI requires all subclasses to implement a tag() method, which takes a list of words as input and returns a list of tagged words as output. TaggerI also provides an evaluate() method for evaluating the accuracy of the tagger (covered at the end of the Default tagging recipe). Many taggers can also be combined into a backoff chain, so that if one tagger cannot tag a word, the next tagger is used, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Part-of-speech Tagging

Create new playlist

Sign In

Sign Up

Chapter 4. Part-of-speech Tagging

Introduction

Table of Contents for
4. Part-of-speech Tagging