Predicting the authenticity of banknotes

In this section, we will study the problem of predicting whether a particular banknote is genuine or whether it has been forged. The banknote authentication dataset is hosted at The creators of the dataset have taken specimens of both genuine and forged banknotes and photographed them with an industrial camera. The resulting grayscale image was processed using a type of time-frequency transformation known as a wavelet transform. Three features of this transform are constructed, and, along with the image entropy, they make up the four features in total for this binary classification task.

Column name





Variance of the wavelet-transformed image



Skewness of the wavelet-transformed image



Kurtosis of the wavelet-transformed image



Entropy of the image



Authenticity (a 0 output means genuine and a 1 output means forged)

First, we will split our 1,372 observations into training and test sets:

> library(caret)
> set.seed(266)
> bnote_sampling_vector <- createDataPartition(bnote$class, p = 
                           0.80, list = FALSE)
> bnote_train <- bnote[bnote_sampling_vector,]
> bnote_test <- bnote[-bnote_sampling_vector,]

Next, we will introduce the C50 R package that contains an implementation of the C5.0 algorithm for classification. The C5.0() function that belongs to this package also takes in a formula and a data frame as its minimum required input. Just as before, we can use the summary() function to examine the resulting model. Instead of reproducing the entire output of the latter, we'll focus on just the tree that is built:

> bnote_tree <- C5.0(class ~ ., data = bnote_train)
> summary(bnote_tree)
waveletVar > 0.75896:
:...waveletCurt > -1.9702: 0 (342)
:   waveletCurt <= -1.9702:
:   :...waveletSkew > 4.9228: 0 (128)
:       waveletSkew <= 4.9228:
:       :...waveletVar <= 3.4776: 1 (34)
:           waveletVar > 3.4776: 0 (2)
waveletVar <= 0.75896:
:...waveletSkew > 5.1401:
    :...waveletVar <= -3.3604: 1 (31)
    :   waveletVar > -3.3604: 0 (93/1)
    waveletSkew <= 5.1401:
    :...waveletVar > 0.30081:
        :...waveletCurt <= 0.35273: 1 (25)
        :   waveletCurt > 0.35273:
        :   :...entropy <= 0.71808: 0 (24)
        :       entropy > 0.71808: 1 (3)
        waveletVar <= 0.30081:
        :...waveletCurt <= 3.0423: 1 (241)
            waveletCurt > 3.0423:
            :...waveletSkew > -1.8624: 0 (21/1)
                waveletSkew <= -1.8624:
                :...waveletVar <= -0.69572: 1 (146)
                    waveletVar > -0.69572:
                    :...entropy <= -0.73535: 0 (2)
                        entropy > -0.73535: 1 (6)

As we can see, it is perfectly acceptable to use a feature more than once in the tree in order to make a new split. The numbers in brackets to the right of the leaf nodes in the tree indicate the number of observations from each class that are assigned to that node. As we can see, the vast majority of the leaf nodes in the tree are pure nodes, so that only observations from one class are assigned to them.

Only two leaf nodes have a single observation each from the minority class for that node, and with this we can infer that we only made two mistakes in our training data using this model. To see if our model has overfitted the data or whether it really can generalize well, we'll test it on our test set:

> bnote_predictions <- predict(bnote_tree, bnote_test)
> mean(bnote_test$class == bnote_predictions)
[1] 0.9890511

The test accuracy is near perfect, a rare sight and the last time in this chapter that we'll be finished so easily! As a final note, C50() also has a costs parameter, which is useful for dealing with asymmetric error costs.

