Installing NLTK and its modules

Before getting started with the examples, we will set the system up with NLTK and other dependent Python libraries. The pip installer can be used to install NLTK, with an optional installation of numpy, as follows:

sudo pip install -U nltk
sudo pip install -U numpy

The NLTK corpora and various modules can be installed by using the common NLTK downloader in the Python interactive shell or a Jupyter Notebook, shown as follows:

import nltk
nltk.download()

The preceding command will open an NLTK Downloader, as follows. Select the packages or collections that are required:

As shown in the preceding screenshot, specific collections, text corpora, NLTK models, or packages, can be selected and installed. Navigate to stopwords and install it for future use. The following is a list of modules that are required for this chapter's examples:

No

Package Name

Description

1

brown

Brown text corpus

2

gutenberg

Gutenberg text corpus

3

max_ne_chunker

Module for text chunking

4

movie_reviews

Movie review sentiment polarity data

5

product_reviews_1

Basic product reviews corpus

6

punkt

Word and sentence tokenizer modules

7

treebank

Penn Treebank dataset sample

8

twitter_samples

Twitter messages sample

9

universal_tagset

Universal POS tag mapping

10

webtext

Web text corpus

11

wordnet

WordNet corpus

12

words

Word list

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset