Loss and accuracy based on training data are obtained as 0.115 and 0.960, respectively, as shown in the following code:
# Model evaluation
model %>% evaluate(trainx, trainy)
$loss 0.1151372
$acc 0.9603167
Next, we create a confusion matrix based on the predicted and actual values:
# Prediction and confusion matrix
pred <- model %>% predict_classes(trainx)
table(Predicted=pred, Actual=mnist$train$y)
OUTPUT
Actual Predicted 0 1 2 3 4 5 6 7 8 9 0 5655 1 53 48 1 0 359 0 2 0 1 1 5969 2 8 1 0 3 0 0 0 2 50 0 5642 23 219 0 197 0 2 0 3 42 23 20 5745 50 0 50 0 3 0 4 7 1 156 106 5566 0 122 0 4 0 5 0 0 0 0 0 5971 0 6 1 12 6 230 3 121 68 159 0 5263 0 11 0 7 0 0 0 0 0 22 0 5958 3 112 8 15 3 6 2 4 4 6 0 5974 0 9 0 0 0 0 0 3 0 36 0 5876
From the preceding confusion matrix, we can make the following observations:
- The correct classifications shown on the diagonal for all 10 categories have large values, with the lowest being 5,263 out of 6,000 for item 6 (shirt).
- The best classification performance is seen for item 8 (bag), where this model correctly classifies 5,974 bag images out of 6,000.
- Among off-diagonal numbers representing misclassifications by the model, the highest value is 359, where item 6 (shirt) is mistaken for item 0 (t-shirt/top). There are 230 occasions when item-0 (t-shirt/top) is misclassified as item 6 (shirt). So, this model certainly has some difficulty differentiating between item 0 and item 6.
Let's also look deeper by calculating prediction probabilities for the first five items, as shown in the following code:
# Prediction probabilities
prob <- model %>% predict_proba(trainx)
prob <- round(prob, 3)
cbind(prob, Predicted_class = pred, Actual = mnist$train$y)[1:5,]
OUTPUT
Predicted_class Actual [1,] 0.000 0.000 0.000 0.000 0 0 0.000 0.001 0 0.999 9 9 [2,] 1.000 0.000 0.000 0.000 0 0 0.000 0.000 0 0.000 0 0 [3,] 0.969 0.000 0.005 0.003 0 0 0.023 0.000 0 0.000 0 0 [4,] 0.023 0.000 0.000 0.968 0 0 0.009 0.000 0 0.000 3 3 [5,] 0.656 0.001 0.000 0.007 0 0 0.336 0.000 0 0.000 0 0
We can observe from the preceding output that all five fashion items are correctly classified. The correct classification probabilities range from 0.656 (item 0 in the fifth row) to 1.000 (item 0 in the second row). These probabilities are significantly high to effect correct classification without any confusion.
Now, let's see whether this performance is replicated with test data.