Before getting started with the examples, we will set the system up with NLTK and other dependent Python libraries. The pip installer can be used to install NLTK, with an optional installation of numpy, as follows:
sudo pip install -U nltk
sudo pip install -U numpy
The NLTK corpora and various modules can be installed by using the common NLTK downloader in the Python interactive shell or a Jupyter Notebook, shown as follows:
import nltk
nltk.download()
The preceding command will open an NLTK Downloader, as follows. Select the packages or collections that are required:
As shown in the preceding screenshot, specific collections, text corpora, NLTK models, or packages, can be selected and installed. Navigate to stopwords and install it for future use. The following is a list of modules that are required for this chapter's examples:
No |
Package Name |
Description |
1 |
brown |
Brown text corpus |
2 |
gutenberg |
Gutenberg text corpus |
3 |
max_ne_chunker |
Module for text chunking |
4 |
movie_reviews |
Movie review sentiment polarity data |
5 |
product_reviews_1 |
Basic product reviews corpus |
6 |
punkt |
Word and sentence tokenizer modules |
7 |
treebank |
Penn Treebank dataset sample |
8 |
twitter_samples |
Twitter messages sample |
9 |
universal_tagset |
Universal POS tag mapping |
10 |
webtext |
Web text corpus |
11 |
wordnet |
WordNet corpus |
12 |
words |
Word list |