Word tagging

Tagging, or POS-Tagging, is the association between a word (or a token) and its part-of-speech tag (POS-Tag). After tagging, you know what (and where) the verbs, adjectives, nouns, and so on, are in the sentence. Even in this case, NLTK makes this complex operation very easy:

In: import nltk
    print (nltk.pos_tag(nltk_tokens))

Out: [('The', 'DT'), ('coolest', 'NN'), ('job', 'NN'), ('in', 'IN'), 
      ('the', 'DT'), ('next', 'JJ'), ('10', 'CD'), ('years', 'NNS'), 
      ('will', 'MD'), ('be', 'VB'), ('statisticians', 'NNS'), ('.', '.'), 
      ('People', 'NNS'), ('think', 'VBP'), ('I', 'PRP'), ("'m", 'VBP'), 
      ('joking', 'VBG'), (',', ','), ('but', 'CC'), ('who', 'WP'), 
      ('would', 'MD'), ("'ve", 'VB'), ('guessed', 'VBN'), ('that', 'IN'), 
      ('computer', 'NN'), ('engineers', 'NNS'), ('would', 'MD'), 
      ("'ve", 'VB'),  ('been', 'VBN'), ('the', 'DT'), ('coolest', 'NN'), 
      ('job', 'NN'), ('of', 'IN'), ('the', 'DT'), ('1990s', 'CD'), 
      ('?', '.')]

Using the syntax of NLTK, you will realize that the The token represents a determiner (DT), coolest and job represent nouns (NN), in represents a conjunction, and so on. The association is really detailed; in the case of a verb, there are six possible tags, as follows:

Take: VB (verb, base form)
Took: VBD (verb, past tense)
Taking: VBG (verb, gerund)
Taken: VBN (verb, past participle)
Take: VBP (verb, singular present tense)
Takes: VBZ (verb, third-person singular present tense)

If you need a more detailed view of the sentence, you may want to use the parse tree tagger to understand its syntactic structure. This operation is rarely used in data science, since it's great for sentence-by-sentence analysis.

Table of Contents for Word tagging

Create new playlist

Sign In

Sign Up

Table of Contents for
Word tagging