Determining the word types

This is what part-of-speech tagging, or POS tagging, is all about. A POS tagger analyzes a sentence and tags each word with its part of speech, for example, whether the word book is a noun (this is a good book) or a verb (could you please book the flight?).

You might have already guessed that NLTK will play its role in this area as well. And indeed, it comes readily packaged with all sorts of parsers and taggers. The POS tagger we will use, nltk.pos_tag(), is actually a full-blown classifier trained using manuallyannotated sentences from the Penn Treebank Project. It takes as input a list of word tokens and outputs a list of tuples, where each element contains the part of the original sentence and its part-of-speech tag:

>>> import nltk
>>> nltk.pos_tag(nltk.word_tokenize("This is a good book."))
[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('book',
'NN'), ('.', '.')]
>>> nltk.pos_tag(nltk.word_tokenize("Could you please book the flight?"))
[('Could', 'MD'), ('you', 'PRP'), ('please', 'VB'), ('book', 'NN'),
('the', 'DT'), ('flight', 'NN'), ('?', '.')]

The POS tag abbreviations are taken from the Penn Treebank (adapted from http://www.anc.org/OANC/penn.html):

POS tag

Description

Example

CC

coordinating conjunction

or

CD

cardinal number

2, second

DT

Determiner

the

EX

existential there

there are

FW

foreign word

kindergarten

IN

preposition/subordinating conjunction

on, of, like

JJ

Adjective

cool

JJR

adjective, comparative

cooler

JJS

adjective, superlative

coolest

LS

list marker

1)

MD

Modal

could, will

NN

noun, singular or mass

book

NNS

noun plural

books

NNP

proper noun, singular

Sean

NNPS

proper noun, plural

Vikings

PDT

Predeterminer

both the boys

POS

possessive ending

friend's

PRP

personal pronoun

I, he, it

PRP$

possessive pronoun

my, his

RB

Adverb

however, usually, naturally, here, good

RBR

adverb, comparative

better

RBS

adverb, superlative

best

RP

Particle

give up

TO

To

to go, to him

UH

Interjection

uhhuhhuhh

VB

verb, base form

take

VBD

verb, past tense

took

VBG

verb, gerund/present participle

taking

VBN

verb, past participle

taken

VBP

verb, sing. present, non-3d

take

VBZ

verb, 3rd person sing. present

takes

WDT

wh-determiner

which

WP

wh-pronoun

who, what

WP$

possessive wh-pronoun

whose

WRB

wh-abverb

where, when

 

With these tags, it is pretty easy to filter the desired tags from the output of pos_tag(). We simply have to count all words whose tags start with NN for nouns, VB for verbs,
JJ for adjectives, and RB for adverbs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset