Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Your Turn

Here are the answers to the questions posed in the above sections:

Can we remove stop words before POS tagging?
No; If we remove the stop words, we will lose the context, and some of the POS taggers (Pre-Trained model) use word context as features to give the POS of the given word.

How can we get all the verbs in the sentence?

We can get all the verbs in the sentence by using pos_tag

>>>tagged = nltk.pos_tag(word_tokenize(s))
>>>allverbs = [word for word,pos in tagged if pos in ['VB','VBD','VBG'] ]

Can you modify the code of the hybrid tagger in the N-gram tagger section to work with Regex tagger? Does that improve performance?
Yes. We can modify the code of the hybrid tagger in the N-gram tagger section to work with the Regex tagger:
```
>>>print unigram_tagger.evaluate(test_data,backoff= regexp_tagger)
>>>bigram_tagger = BigramTagger(train_data, backoff=unigram_tagger)
>>>print bigram_tagger.evaluate(test_data)
>>>trigram_tagger=TrigramTagger(train_data,backoff=bigram_tagger)
>>>print trigram_tagger.evaluate(test_data)
0.857122212053
0.866708415627
0.863914446746
```
The performance improves as we add some basic pattern-based rules, instead of predicting the most frequent tag.

Can you write a tagger that tags Date and Money expressions?

Yes, we can write a tagger that tags Date and Money expressions. Following is the code:

>>>date_regex = RegexpTagger([(r'(d{2})[/.-](d{2})[/.-](d{4})$','DATE'),(r'$','MONEY')])
>>>test_tokens = "I will be flying on sat 10-02-2014 with around 10M $ ".split()
>>>print date_regex.tag(test_tokens)