Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Stemming

One thing is still missing. We count similar words in different variants as different words. Post 2, for instance, contains imaging and images. It make sense to count them together. After all, it is the same concept they are referring to.

We need a function that reduces words to their specific word stem. Scikit does not contain a stemmer by default. With the Natural Language Toolkit (NLTK), we can download a free software toolkit, which provides a stemmer that we can easily plug into CountVectorizer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Stemming

Create new playlist

Sign In

Sign Up

Table of Contents for
Stemming