Here are the answers to the questions posed in the above sections:
No; If we remove the stop words, we will lose the context, and some of the POS taggers (Pre-Trained model) use word context as features to give the POS of the given word.
We can get all the verbs in the sentence by using pos_tag
>>>tagged = nltk.pos_tag(word_tokenize(s)) >>>allverbs = [word for word,pos in tagged if pos in ['VB','VBD','VBG'] ]
Yes. We can modify the code of the hybrid tagger in the N-gram tagger section to work with the Regex tagger:
>>>print unigram_tagger.evaluate(test_data,backoff= regexp_tagger) >>>bigram_tagger = BigramTagger(train_data, backoff=unigram_tagger) >>>print bigram_tagger.evaluate(test_data) >>>trigram_tagger=TrigramTagger(train_data,backoff=bigram_tagger) >>>print trigram_tagger.evaluate(test_data) 0.857122212053 0.866708415627 0.863914446746
The performance improves as we add some basic pattern-based rules, instead of predicting the most frequent tag.
Yes, we can write a tagger that tags Date and Money expressions. Following is the code:
>>>date_regex = RegexpTagger([(r'(d{2})[/.-](d{2})[/.-](d{4})$','DATE'),(r'$','MONEY')]) >>>test_tokens = "I will be flying on sat 10-02-2014 with around 10M $ ".split() >>>print date_regex.tag(test_tokens)
Can you try a similar word cloud to what we did in Chapter 1, Introduction to Natural Language Processing with only nouns and verbs now?
References:
https://github.com/japerk/nltk-trainer
http://en.wikipedia.org/wiki/Part-of-speech_tagging
http://en.wikipedia.org/wiki/Named-entity_recognition
http://www.inf.ed.ac.uk/teaching/courses/icl/nltk/tagging.pdf