NLTK is a simple pip install nltk away.
To check whether your installation was successful, open a Python interpreter and type:
>>> import nltk
You will find a very nice tutorial on NLTK in the book Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins, published by Packt Publishing.
To play around a little bit with a stemmer, you can visit the web page http://text-processing.com/demo/stem/.
To play around a little bit with a stemmer, you can visit the web page http://text-processing.com/demo/stem/.
NLTK comes with different stemmers. This is necessary, because every language has a different set of rules for stemming. For English, we can take SnowballStemmer:
>>> import nltk.stem >>> s = nltk.stem.SnowballStemmer('english') >>> s.stem("graphics") 'graphic' >>> s.stem("imaging") 'imag' >>> s.stem("image") 'imag' >>> s.stem("imagination") 'imagin' >>> s.stem("imagine") 'imagin'
The stemming does not necessarily have to result in valid English words.
It also works with verbs:
>>> s.stem("buys") 'buy' >>> s.stem("buying") 'buy'
This means it works most of the time:
>>> s.stem("bought") 'bought'