Determining the word types

This is what part-of-speech tagging, or POS tagging, is all about. A POS tagger analyzes a sentence and tags each word with its part of speech, for example, whether the word book is a noun (this is a good book) or a verb (could you please book the flight?).

You might have already guessed that NLTK will play its role in this area as well. And indeed, it comes readily packaged with all sorts of parsers and taggers. The POS tagger we will use, nltk.pos_tag(), is actually a full-blown classifier trained using manuallyannotated sentences from the Penn Treebank Project. It takes as input a list of word tokens and outputs a list of tuples, where each element contains the part of the original sentence and its part-of-speech tag:

>>> import nltk
>>> nltk.pos_tag(nltk.word_tokenize("This is a good book."))
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('book', 
'NN'), ('.', '.')]
>>> nltk.pos_tag(nltk.word_tokenize("Could you please book the flight?"))
[('Could', 'MD'), ('you', 'PRP'), ('please', 'VB'), ('book', 'NN'), 
('the', 'DT'), ('flight', 'NN'), ('?', '.')]

The POS tag abbreviations are taken from the Penn Treebank (adapted from http://www.anc.org/OANC/penn.html):

POS tag	Description	Example
`CC`	coordinating conjunction	or
`CD`	cardinal number	2, second
`DT`	Determiner	the
`EX`	existential there	there are
`FW`	foreign word	kindergarten
`IN`	preposition/subordinating conjunction	on, of, like
`JJ`	Adjective	cool
`JJR`	adjective, comparative	cooler
`JJS`	adjective, superlative	coolest
`LS`	list marker	1)
`MD`	Modal	could, will
`NN`	noun, singular or mass	book
`NNS`	noun plural	books
`NNP`	proper noun, singular	Sean
`NNPS`	proper noun, plural	Vikings
`PDT`	Predeterminer	both the boys
`POS`	possessive ending	friend's
`PRP`	personal pronoun	I, he, it
`PRP$`	possessive pronoun	my, his
`RB`	Adverb	however, usually, naturally, here, good
`RBR`	adverb, comparative	better
`RBS`	adverb, superlative	best
`RP`	Particle	give up
`TO`	To	to go, to him
`UH`	Interjection	uhhuhhuhh
`VB`	verb, base form	take
`VBD`	verb, past tense	took
`VBG`	verb, gerund/present participle	taking
`VBN`	verb, past participle	taken
`VBP`	verb, sing. present, non-3d	take
`VBZ`	verb, 3rd person sing. present	takes
`WDT`	wh-determiner	which
`WP`	wh-pronoun	who, what
`WP$`	possessive wh-pronoun	whose
`WRB`	wh-abverb	where, when

With these tags, it is pretty easy to filter the desired tags from the output of pos_tag(). We simply have to count all words whose tags start with NN for nouns, VB for verbs,
JJ for adjectives, and RB for adverbs.

Table of Contents for Determining the word types

Create new playlist

Sign In

Sign Up

Table of Contents for
Determining the word types