Regularization

There are several ways of controlling the training of neural networks to prevent overfitting in the training phase, for example, L2/L1 regularization, max-norm constraints, and dropout:

L2 regularization: This is probably the most common form of regularization. Using the gradient descent parameter update, L2 regularization signifies that every weight will be decayed linearly towards zero.
L1 regularization: For each weight w, we add the term λ∣w∣ to the objective. However, it is also possible to combine L1 and L2 regularization to achieve elastic net regularization.
Max-norm constraints: Used to enforce an absolute upper bound on the magnitude of the weight vector for each hidden layer neuron. Projected gradient descent can then be used further to enforce the constraint.
Dropout: When working with a neural network, we need another placeholder for dropout, which is a hyperparameter to be tuned and the training time but not the test time. It is implemented by only keeping a neuron active with some probability, say p<1.0, or setting it to zero otherwise. The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with dropout_keep_prob < 1.0 during training, the outgoing weights of that unit are multiplied by p at test time (Figure 17).

Apart from these hyperparameters, another advantage of using H2O-based deep learning algorithms is that we can take the relative variable/feature importance. In previous chapters, we saw that by using the random forest algorithm in Spark, it is also possible to compute the variable importance.

So, the idea is that if your model does not perform well, it would be worth dropping the less important features and doing the training again. Now, it is possible to find the feature importance during supervised training. I have observed this feature importance:

Figure 25: Relative variable importance

Now the question would be: why don't you drop them and try training again to see if the accuracy has increased or not? Well, I leave it up to the readers.

Table of Contents for Regularization

Create new playlist

Sign In

Sign Up

Table of Contents for
Regularization