Softmax activation

Softmax activation is most frequently used for classification tasks. Softmax rescales the output from the previous layer. First, it calculates the exponential of the input to its neuron and then it divides the total sum of input with all of the neurons in the layer, so that the activation sums up to one and lies between zero and one.

The softmax equation looks like this:

Softmax activation is used as the last layer in a deep learning neural network for multi-class classification. The layer has the same number of nodes as the number of classes and it rescales the output so that it adds up to 1.0, therefore calculating the probability for each class, that is P(y|x).

Let us build our model and test it on an XOR truth table:

> model <- mx.model.FeedForward.create(out, X=X
+                                      , y=Y
+                                      , ctx = mx.ctx.default()
+                                      , array.layout = "rowmajor"
+                                      , learning.rate = 0.01
+                                      , momentum = 0.9
+                                      , array.batch.size = 50
+                                      , num.round = 20
+                                      , eval.metric = mx.metric.accuracy
+    #                                  , initializer = mx.init.normal(c(0.0,0.1))
+ )
Start training with 1 devices
[1] Train-accuracy=0.506075949367089
[2] Train-accuracy=0.497
[3] Train-accuracy=0.497
[4] Train-accuracy=0.5095
[5] Train-accuracy=0.558
[6] Train-accuracy=0.67625
[7] Train-accuracy=0.74075
[8] Train-accuracy=0.75
[9] Train-accuracy=0.75
[10] Train-accuracy=0.75
[11] Train-accuracy=0.75
[12] Train-accuracy=0.75
[13] Train-accuracy=0.75
[14] Train-accuracy=0.912
[15] Train-accuracy=1
[16] Train-accuracy=1
[17] Train-accuracy=1
[18] Train-accuracy=1
[19] Train-accuracy=1
[20] Train-accuracy=1
> X_test = data.matrix(rbind(c(0,0),c(1,1),c(1,0),c(0,1) ) )
> preds = predict(model, X_test, array.layout = "rowmajor")
> pred.label <- max.col(t(preds)) -1
> pred.label
[1] 0 0 1 1

We complete our model building by calling mx.model.FeedForward.Create. We pass our final output layer out to this function. We also pass our training dataset and set up some parameters for the multi-layer perceptron.

Finally, we pass the model to the predict function to see the predictions.

Wow, we have finished building a neural network with 100% accuracy for our training data. You can look at the prediction results from the model for a simple XOR truth table.

We can further investigate our model by looking at the graph generated by our symbolic programming:

> graph.viz(model$symbol)
> model$arg.params
$fullyconnected4_weight
          [,1]     [,2]
[1,] -1.773083 1.751193
[2,] -1.774175 1.751324

$fullyconnected4_bias
[1]  1.769027 -1.754247

$fullyconnected5_weight
         [,1]      [,2]
[1,] 2.171195 -2.166867
[2,] 2.145441 -2.132973

$fullyconnected5_bias
[1] -1.662504  1.662504

>

The computational graph is shown in the following diagram:

Furthermore, we can look into the weights our model has generated for this network.

Hopefully, this section gave you a good overview of the Mxnet R package. Using this knowledge, let us go ahead and solve our time series problem.

Table of Contents for Softmax activation

Create new playlist

Sign In

Sign Up

Table of Contents for
Softmax activation