The code for obtaining the loss, accuracy, and confusion matrix for the test data is as follows:
# Loss and accuracy
model %>% evaluate(testx, testy)
$loss
[1] 4.437256
$acc
[1] 0.768
# Confusion matrix
pred <- model %>% predict_classes(testx)
table(Predicted = pred, Actual = data$test$y[1:2000,])
Actual
Predicted 0 1 2 3 4 5 6 7 8 9
0 158 1 12 0 5 1 6 2 15 0
1 3 142 0 2 0 2 3 1 9 2
2 2 0 139 8 6 3 6 0 0 0
3 0 0 3 86 5 13 6 1 0 0
4 4 0 14 6 138 5 10 4 1 0
5 0 0 15 47 6 148 2 12 0 0
6 0 0 4 12 9 3 178 0 0 0
7 2 0 4 23 27 9 3 169 0 0
8 13 1 1 5 1 0 0 0 179 2
9 14 54 3 10 1 1 2 4 13 199
# Accuracy for each category
100*diag(tab)/colSums(tab)
0 1 2 3 4
80.61224 71.71717 71.28205 43.21608 69.69697
5 6 7 8 9
80.00000 82.40741 87.56477 82.48848 98.02956
From the preceding output, we can make the following observations:
- The loss and accuracy based on the test data are 4.437 and 0.768, respectively.
- Although this performance based on the test data is inferior to the results based on the training data, it is a significant improvement over the results from the first model.
- The confusion matrix provides further insights into the model's performance. The best performance is for category 9 (truck), with 199 correct classifications and an accuracy of 98%.
- For the test data, the model seems to be the most confused regarding category-3 (cat), which has the most misclassifications. The accuracy of this category can be as low as 43.2%.
- The highest misclassification for a single category (54 images) is for category-1 (automobile), which is misclassified as category-9 (truck).
With 76.8% accuracy, we can say that this image classification performance is decent. The use of a pretrained model has allowed us to transfer our learning of a model trained on data involving over 1 million images to new data containing 2,000 images from the CIFAR10 dataset. This is a huge advantage compared to building an image classification model totally from scratch, which would involve more time and computing costs. Now that we've achieved a decent performance from the model, we can explore how to improve this even further.