The third classifier we will cover is the MaxentClassifier class, also known as a conditional exponential classifier or logistic regression classifier. The maximum entropy classifier converts labeled feature sets to vectors using encoding. This encoded vector is then used to calculate weights for each feature that can then be combined to determine the most likely label for a feature set. For more details on the math behind this, see

The MaxentClassifier class requires the NumPy package. This is because the feature encodings use NumPy arrays. You can find installation details at the following link:


The MaxentClassifier class algorithms can be quite memory hungry, so you may want to quit all your other programs while training a MaxentClassifier class, just to be safe.

We will use the same train_feats and test_feats variables from the movie_reviews corpus that we constructed before, and call the MaxentClassifier.train() class method. Like the DecisionTreeClassifier class, MaxentClassifier.train() has its own specific parameters that I have tweaked to speed up training. These parameters will be explained in more detail later:

>>> from nltk.classify import MaxentClassifier
>>> me_classifier = MaxentClassifier.train(train_feats, trace=0, max_iter=1, min_lldelta=0.5)
>>> accuracy(me_classifier, test_feats)

The reason this classifier has such a low accuracy is because I set the parameters such that it is unable to learn a more accurate model. This is due to the time required to train a suitable model using the default iis algorithm. A better algorithm is gis, which can be trained like this:

>>> me_classifier = MaxentClassifier.train(train_feats, algorithm='gis', trace=0, max_iter=10, min_lldelta=0.5)
>>> accuracy(me_classifier, test_feats)

The gis algorithm is a bit faster and generally more accurate than the default iis algorithm, and can be allowed to run for up to 10 iterations in a reasonable amount of time. Both iis and gis will be explained in more detail in the next section.


If training is taking a long time, you can usually cut it off manually by hitting Ctrl + C. This should stop the current iteration and still return a classifier based on whatever state the model is in.

Like the previous classifiers, MaxentClassifier inherits from ClassifierI, as shown in the following diagram:

Depending on the algorithm, MaxentClassifier.train() calls one of the training functions in the nltk.classify.maxent module. The default algorithm is iis, and the function used is train_maxent_classifier_with_iis(). The other algorithm that's included is gis, which uses the train_maxent_classifier_with_gis() function. GIS stands for General Iterative Scaling, while IIS stands for Improved Iterative Scaling. The only difference between these two algorithms that really matters is that gis is much faster than iis.

If megam is installed and you specify the megam algorithm, then train_maxent_classifier_with_megam() is used (megam is covered in more detail in the next section).


Previous versions of NLTK provided additional algorithms if SciPy was installed. These algorithms have been removed, but many other algorithms can be used in conjunction with scikit-learn, which we will cover in the next recipe, Training scikit-learn classifiers.

The basic idea behind the maximum entropy model is to build some probability distributions that fit the observed data and then choose whichever probability distribution has the highest entropy. The gis and iis algorithms do so by iteratively improving the weights used to classify features. This is where the max_iter and min_lldelta parameters come into play.

The max_iter variable specifies the maximum number of iterations to go through and update the weights. More iterations will generally improve accuracy, but only up to a point. Eventually, the changes from one iteration to the next will hit a plateau and further iterations are useless.

The min_lldelta variable specifies the minimum change in the log likelihood required to continue iteratively improving the weights. Before beginning training iterations, an instance of nltk.classify.util.CutoffChecker is created. When its check() method is called, it uses functions such as nltk.classify.util.log_likelihood() to decide whether the cutoff limits have been reached. The log likelihood is the log (using math.log()) of the average label probability of the training data (which is the log of the average likelihood of a label). As the log likelihood increases, the model improves. But it too will reach a plateau where further increases are so small that there is no point in continuing. Specifying the min_lldelta variable allows you to control how much each iteration must increase the log likelihood before stopping the iterations.

Like the NaiveBayesClassifier class, you can see the most informative features by calling the show_most_informative_features() method:

>>> me_classifier.show_most_informative_features(n=4)
-0.740 worst==True and label is 'pos'

0.740 worst==True and label is 'neg'

0.715 bad==True and label is 'neg'

-0.715 bad==True and label is 'pos'

The numbers shown are the weights for each feature. This tells us that the word worst is negatively weighted towards the pos label, and positively weighted towards the neg label. In other words, if the word worst is found in the feature set, then there's a strong possibility that the text should be classified neg.

Megam algorithm

If you have installed the megam package, then you can use the megam algorithm. It's faster than the included algorithms and much more accurate, but it can also be difficult to install. Installation instructions and information can be found at the following link:

The nltk.classify.megam.config_megam() function can be used to specify where the megam executable is found. Or, if megam can be found in the standard executable paths, NLTK will configure it automatically:

>>> me_classifier = MaxentClassifier.train(train_feats, algorithm='megam', trace=0, max_iter=10)
[Found megam: /usr/local/bin/megam]
>>> accuracy(me_classifier, test_feats)

