The softmax function is basically the generalization of the sigmoid function. It is usually applied to the final layer of the network and while performing multi-class classification tasks. It gives the probabilities of each class for being output and thus, the sum of softmax values will always equal 1.
It can be represented as follows:
As shown in the following diagram, the softmax function converts their inputs to probabilities:
The softmax function can be implemented in Python as follows:
def softmax(x):
return np.exp(x) / np.exp(x).sum(axis=0)