Hyperparameter tuning and feature selection

Here are some ways of improving the accuracy by tuning hyperparameters, such as the number of hidden layers, the neurons in each hidden layer, the number of epochs, and the activation function. The current implementation of the H2O-based deep learning model supports the following activation functions:

  • ExpRectifier
  • ExpRectifierWithDropout
  • Maxout
  • MaxoutWithDropout
  • Rectifier
  • RectifierWthDropout
  • Tanh
  • TanhWithDropout

Apart from the Tanh one, I have not tried other activation functions for this project. However, you should definitely try.

One of the biggest advantages of using H2O-based deep learning algorithms is that we can take the relative variable/feature importance. In previous chapters, we have seen that, using the random forest algorithm in Spark, it is also possible to compute the variable importance. So, the idea is that if your model does not perform well, it would be worth dropping less important features and doing the training again.

Let's see an example; in Figure 13, we have seen the most important features in unsupervised training in autoencoder. Now, it is also possible to find the feature importance during supervised training. I have observed feature importance here:

Figure 25: False positives across different prediction thresholds in [0.0, 1.0]

Therefore, from Figure 25, it can be observed that the features Time, V21, V17, and V6 are less important ones. So why don't you drop them and try training again and observe whether the accuracy has increased or not?

Nevertheless, grid searching or cross-validation techniques could still provide higher accuracy. However, I'll leave it up to you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset