The third classifier we will cover is the MaxentClassifier
class, also known as a conditional exponential classifier or
logistic regression classifier. The maximum entropy classifier converts labeled feature sets to vectors using encoding. This encoded vector is then used to calculate weights for each feature that can then be combined to determine the most likely label for a feature set. For more details on the math behind this, see https://en.wikipedia.org/wiki/Maximum_entropy_classifier.
The MaxentClassifier
class requires the
NumPy
package. This is because the feature encodings use NumPy
arrays. You can find installation details at the following link:
We will use the same train_feats
and test_feats
variables from the movie_reviews
corpus that we constructed before, and call the MaxentClassifier.train()
class method. Like the DecisionTreeClassifier
class, MaxentClassifier.train()
has its own specific parameters that I have tweaked to speed up training. These parameters will be explained in more detail later:
>>> from nltk.classify import MaxentClassifier >>> me_classifier = MaxentClassifier.train(train_feats, trace=0, max_iter=1, min_lldelta=0.5) >>> accuracy(me_classifier, test_feats) 0.5
The reason this classifier has such a low accuracy is because I set the parameters such that it is unable to learn a more accurate model. This is due to the time required to train a suitable model using the default iis
algorithm. A better algorithm is gis
, which can be trained like this:
>>> me_classifier = MaxentClassifier.train(train_feats, algorithm='gis', trace=0, max_iter=10, min_lldelta=0.5) >>> accuracy(me_classifier, test_feats) 0.722
The gis
algorithm is a bit faster and generally more accurate than the default iis
algorithm, and can be allowed to run for up to 10 iterations in a reasonable amount of time. Both iis
and gis
will be explained in more detail in the next section.
Like the previous classifiers, MaxentClassifier
inherits from ClassifierI
, as shown in the following diagram:
Depending on the algorithm, MaxentClassifier.train()
calls one of the training functions in the nltk.classify.maxent
module. The default algorithm is iis
, and the function used is train_maxent_classifier_with_iis()
. The other algorithm that's included is gis
, which uses the train_maxent_classifier_with_gis()
function. GIS stands for General Iterative Scaling, while IIS stands for Improved Iterative Scaling. The only difference between these two algorithms that really matters is that gis
is much faster than iis
.
If megam
is installed and you specify the megam
algorithm, then train_maxent_classifier_with_megam()
is used (megam
is covered in more detail in the next section).
The basic idea behind the maximum entropy model is to build some probability distributions that fit the observed data and then choose whichever probability distribution has the highest entropy. The gis
and iis
algorithms do so by iteratively improving the weights used to classify features. This is where the max_iter
and min_lldelta
parameters come into play.
The max_iter
variable specifies the maximum number of iterations to go through and update the weights. More iterations will generally improve accuracy, but only up to a point. Eventually, the changes from one iteration to the next will hit a plateau and further iterations are useless.
The min_lldelta
variable specifies the minimum change in the log likelihood required to continue iteratively improving the weights. Before beginning training iterations, an instance of nltk.classify.util.CutoffChecker
is created. When its check()
method is called, it uses functions such as nltk.classify.util.log_likelihood()
to decide whether the cutoff limits have been reached. The log likelihood is the log (using math.log()
) of the average label probability of the training data (which is the log of the average likelihood of a label). As the log likelihood increases, the model improves. But it too will reach a plateau where further increases are so small that there is no point in continuing. Specifying the min_lldelta
variable allows you to control how much each iteration must increase the log likelihood before stopping the iterations.
Like the NaiveBayesClassifier
class, you can see the most informative features by calling the show_most_informative_features()
method:
>>> me_classifier.show_most_informative_features(n=4) -0.740 worst==True and label is 'pos' 0.740 worst==True and label is 'neg' 0.715 bad==True and label is 'neg' -0.715 bad==True and label is 'pos'
The numbers shown are the weights for each feature. This tells us that the word worst
is negatively weighted towards the pos
label, and positively weighted towards the neg
label. In other words, if the word worst
is found in the feature set, then there's a strong possibility that the text should be classified neg
.
If you have installed the megam
package, then you can use the megam
algorithm. It's faster than the included algorithms and much more accurate, but it can also be difficult to install. Installation instructions and information can be found at the following link:
http://www.umiacs.umd.edu/~hal/megam/
The nltk.classify.megam.config_megam()
function can be used to specify where the megam
executable is found. Or, if megam
can be found in the standard executable paths, NLTK will configure it automatically:
>>> me_classifier = MaxentClassifier.train(train_feats, algorithm='megam', trace=0, max_iter=10) [Found megam: /usr/local/bin/megam] >>> accuracy(me_classifier, test_feats) 0.86799999999999999
The Bag of words feature extraction and the Training a Naive Bayes classifier recipes in this chapter show how to construct the training and testing features from the movie_reviews
corpus. The next recipe shows how to train even more accurate classifiers with scikit-learn
. After that, we will cover how and why to evaluate a classifier using precision and recall instead of accuracy, in the Measuring precision and recall of a classifier recipe.