To run our app, we will need to execute the main function routine (in chapter6.py
). It loads the data, trains the classifier, evaluates its performance, and visualizes the result.
But first, we need to import all the relevant modules and set up a main function:
import numpy as np import matplotlib.pyplot as plt from datasets import gtsrb from classifiers import MultiClassSVM def main():
Then, the goal is to compare classification performance across settings and feature extraction methods. This includes running the task with both classification strategies, one-vs-all and one-vs-one, as well as preprocessing the data with a list of different feature extraction approaches:
strategies = ['one-vs-one', 'one-vs-all']features = [None, 'gray', 'rgb', 'hsv', 'surf', 'hog']
For each of these settings, we need to collect three performance metrics—accuracy, precision, and recall:
accuracy = np.zeros((2,len(features))) precision = np.zeros((2,len(features))) recall = np.zeros((2,len(features)))
A nested for
loop will run the classifier with all of these settings and populate the performance metric matrices. The outer loop is over all elements in the features
vector:
for f in xrange(len(features)): (X_train,y_train), (X_test,y_test) = gtsrb.load_data( "datasets/gtsrb_training", feature=features[f], test_split=0.2)
Before passing the training data (X_train
,y_train
) and test data (X_test
,y_test
) to the classifiers, we want to make sure that they are in the format that the classifier expects; that is, each data sample is stored in a row of X_train
or X_test
, with the columns corresponding to feature values:
X_train = np.squeeze(np.array(X_train)).astype(np.float32) y_train = np.array(y_train) X_test = np.squeeze(np.array(X_test)).astype(np.float32) y_test = np.array(y_test)
We also need to know the number of class labels (since we did not load the complete GTSRB dataset). This can be achieved by concatenating y_train
and y_test
and extracting all unique labels in the combined list:
labels = np.unique(np.hstack((y_train,y_test)))
Next, the inner loop iterates over all the elements in the strategies
vector, which currently includes the two strategies, one-vs-all and one-vs-one:
for s in xrange(len(strategies)):
Then we instantiate the MultiClassSVM
class with the correct number of unique labels and the corresponding classification strategy:
MCS = MultiClassSVM(len(labels),strategies[s])
Now we are all ready to apply the ensemble classifier to the training data and extract the three performance metrics after training:
MCS.fit(X_train, y_train) (accuracy[s,f],precision[s,f],recall[s,f]) = MCS.evaluate(X_test, y_test)
This ends the nested for
loop. All that is left to do is to visualize the result. For this, we choose matplotlib's bar plot functionality. The goal is to show the three performance metrics (accuracy, precision, and recall) for all combinations of extracted features and classification strategies. We will use one plotting window per classification strategy, and arrange the corresponding data in a stacked bar plot:
f,ax = plt.subplots(2) for s in xrange(len(strategies)): x = np.arange(len(features)) ax[s].bar(x-0.2, accuracy[s,:], width=0.2, color='b', hatch='/', align='center') ax[s].bar(x, precision[s,:], width=0.2, color='r', hatch='', align='center') ax[s].bar(x+0.2, recall[s,:], width=0.2, color='g', hatch='x', align='center')
For the sake of visibility, the y axis is restricted to the relevant range:
ax[s].axis([-0.5, len(features) + 0.5, 0, 1.5])
Finally, we add some plot decorations:
ax[s].legend(('Accuracy','Precision','Recall'), loc=2, ncol=3, mode="expand") ax[s].set_xticks(np.arange(len(features))) ax[s].set_xticklabels(features) ax[s].set_title(strategies[s])
Now the data is ready to be plotted!
plt.show()
This screenshot contains a lot of information, so let's break it down step by step:
None
with the result for rgb
. These two settings were identical, except that the samples under rgb
were normalized. The difference in performance is evident, especially for the one-vs-all strategy.