Multi-class cross entropy loss is used in multi-class classification, such as the MNIST digits classification problem from Chapter 2, Deep Learning and Convolutional Neural Networks. Like above we use the cross entropy function which after a few calculations we obtain the multi-class cross-entropy loss L for each training example being:
Here, is 0 or 1, indicating whether class label is the correct classification for predicting . To use this loss, we first need to add a softmax activation to the output of the final FC layer in our model. The combined cross-entropy with softmax looks like this:
It is useful to know that the name for the raw output of our model is logits. Logits are what is passed to the softmax function. The softmax function is the multi-class version of the sigmoid function. Once it is passed through the softmax function, we can use our multi-class cross entropy loss. TensorFlow actually combines all these steps together into one operation, as shown:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model_logits, labels=labels_in))
We must use tf.reduce_mean, as we will get a loss value for each image in our batch. We use tf.reduce_mean to get the average loss for our batch.