Word tagging

Tagging, or POS-Tagging, is the association between a word (or a token) and its part-of-speech tag (POS-Tag). After tagging, you know what (and where) the verbs, adjectives, nouns, and so on, are in the sentence. Even in this case, NLTK makes this complex operation very easy:

In: import nltk
print (nltk.pos_tag(nltk_tokens))

Out: [('The', 'DT'), ('coolest', 'NN'), ('job', 'NN'), ('in', 'IN'),
('the', 'DT'), ('next', 'JJ'), ('10', 'CD'), ('years', 'NNS'),
('will', 'MD'), ('be', 'VB'), ('statisticians', 'NNS'), ('.', '.'),
('People', 'NNS'), ('think', 'VBP'), ('I', 'PRP'), ("'m", 'VBP'),
('joking', 'VBG'), (',', ','), ('but', 'CC'), ('who', 'WP'),
('would', 'MD'), ("'ve", 'VB'), ('guessed', 'VBN'), ('that', 'IN'),
('computer', 'NN'), ('engineers', 'NNS'), ('would', 'MD'),
("'ve", 'VB'), ('been', 'VBN'), ('the', 'DT'), ('coolest', 'NN'),
('job', 'NN'), ('of', 'IN'), ('the', 'DT'), ('1990s', 'CD'),
('?', '.')]

Using the syntax of NLTK, you will realize that the The token represents a determiner (DT), coolest and job represent nouns (NN), in represents a conjunction, and so on. The association is really detailed; in the case of a verb, there are six possible tags, as follows:

  • Take: VB (verb, base form)
  • Took: VBD (verb, past tense)
  • Taking: VBG (verb, gerund)
  • Taken: VBN (verb, past participle)
  • Take: VBP (verb, singular present tense)
  • Takes: VBZ (verb, third-person singular present tense)

If you need a more detailed view of the sentence, you may want to use the parse tree tagger to understand its syntactic structure. This operation is rarely used in data science, since it's great for sentence-by-sentence analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset