Your Turn

Here are the answers to the questions posed in the above sections:

  • Can we remove stop words before POS tagging?

    No; If we remove the stop words, we will lose the context, and some of the POS taggers (Pre-Trained model) use word context as features to give the POS of the given word.

  • How can we get all the verbs in the sentence?

    We can get all the verbs in the sentence by using pos_tag

    >>>tagged = nltk.pos_tag(word_tokenize(s))
    >>>allverbs = [word for word,pos in tagged if pos in ['VB','VBD','VBG'] ]
    
  • Can you modify the code of the hybrid tagger in the N-gram tagger section to work with Regex tagger? Does that improve performance?

    Yes. We can modify the code of the hybrid tagger in the N-gram tagger section to work with the Regex tagger:

    >>>print unigram_tagger.evaluate(test_data,backoff= regexp_tagger)
    >>>bigram_tagger = BigramTagger(train_data, backoff=unigram_tagger)
    >>>print bigram_tagger.evaluate(test_data)
    >>>trigram_tagger=TrigramTagger(train_data,backoff=bigram_tagger)
    >>>print trigram_tagger.evaluate(test_data)
    0.857122212053
    0.866708415627
    0.863914446746
    

    The performance improves as we add some basic pattern-based rules, instead of predicting the most frequent tag.

  • Can you write a tagger that tags Date and Money expressions?

    Yes, we can write a tagger that tags Date and Money expressions. Following is the code:

    >>>date_regex = RegexpTagger([(r'(d{2})[/.-](d{2})[/.-](d{4})$','DATE'),(r'$','MONEY')])
    >>>test_tokens = "I will be flying on sat 10-02-2014 with around 10M $ ".split()
    >>>print date_regex.tag(test_tokens)
    

Note

The last two questions haven't been answered.

There can be many rules according to the reader's observation, so there is no Right / Wrong answer here.

Can you try a similar word cloud to what we did in Chapter 1, Introduction to Natural Language Processing with only nouns and verbs now?

References:

https://github.com/japerk/nltk-trainer

http://en.wikipedia.org/wiki/Part-of-speech_tagging

http://en.wikipedia.org/wiki/Named-entity_recognition

http://www.inf.ed.ac.uk/teaching/courses/icl/nltk/tagging.pdf

http://www.nltk.org/api/nltk.tag.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset