This is what part-of-speech tagging, or POS tagging, is all about. A POS tagger analyzes a sentence and tags each word with its part of speech, for example, whether the word book is a noun (this is a good book) or a verb (could you please book the flight?).
You might have already guessed that NLTK will play its role in this area as well. And indeed, it comes readily packaged with all sorts of parsers and taggers. The POS tagger we will use, nltk.pos_tag(), is actually a full-blown classifier trained using manuallyannotated sentences from the Penn Treebank Project. It takes as input a list of word tokens and outputs a list of tuples, where each element contains the part of the original sentence and its part-of-speech tag:
>>> import nltk
>>> nltk.pos_tag(nltk.word_tokenize("This is a good book."))
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('book',
'NN'), ('.', '.')]
>>> nltk.pos_tag(nltk.word_tokenize("Could you please book the flight?")) [('Could', 'MD'), ('you', 'PRP'), ('please', 'VB'), ('book', 'NN'),
('the', 'DT'), ('flight', 'NN'), ('?', '.')]
The POS tag abbreviations are taken from the Penn Treebank (adapted from http://www.anc.org/OANC/penn.html):
POS tag |
Description |
Example |
CC |
coordinating conjunction |
or |
CD |
cardinal number |
2, second |
DT |
Determiner |
the |
EX |
existential there |
there are |
FW |
foreign word |
kindergarten |
IN |
preposition/subordinating conjunction |
on, of, like |
JJ |
Adjective |
cool |
JJR |
adjective, comparative |
cooler |
JJS |
adjective, superlative |
coolest |
LS |
list marker |
1) |
MD |
Modal |
could, will |
NN |
noun, singular or mass |
book |
NNS |
noun plural |
books |
NNP |
proper noun, singular |
Sean |
NNPS |
proper noun, plural |
Vikings |
PDT |
Predeterminer |
both the boys |
POS |
possessive ending |
friend's |
PRP |
personal pronoun |
I, he, it |
PRP$ |
possessive pronoun |
my, his |
RB |
Adverb |
however, usually, naturally, here, good |
RBR |
adverb, comparative |
better |
RBS |
adverb, superlative |
best |
RP |
Particle |
give up |
TO |
To |
to go, to him |
UH |
Interjection |
uhhuhhuhh |
VB |
verb, base form |
take |
VBD |
verb, past tense |
took |
VBG |
verb, gerund/present participle |
taking |
VBN |
verb, past participle |
taken |
VBP |
verb, sing. present, non-3d |
take |
VBZ |
verb, 3rd person sing. present |
takes |
WDT |
wh-determiner |
which |
WP |
wh-pronoun |
who, what |
WP$ |
possessive wh-pronoun |
whose |
WRB |
wh-abverb |
where, when |
With these tags, it is pretty easy to filter the desired tags from the output of pos_tag(). We simply have to count all words whose tags start with NN for nouns, VB for verbs,
JJ for adjectives, and RB for adverbs.