Modeling

As we will see, the deep learning function has quite a few arguments and parameters that you can tune. The thing that I like about the package is the ability to keep it as simple as possible and let the defaults do their thing. If you want to see all the possibilities along with the defaults, see help or run the following command:

    > args(h2o.deeplearning)

Documentation on all the arguments and tuning parameters is available online at http://h2o.ai/docs/master/model/deep-learning/.

As on a side note, you can run a demo for the various machine learning methods by just running demo("method"). For instance, you can go through the deep learning demo with demo(h2o.deeplearning).

Our next goal is to tune the hyper-parameters using a random search. It takes less time than a full grid search. We will look at tanh, with and without dropout, three different hidden layer/neuron combinations, two different dropout ratios, and two different learning rates:

    > hyper_params <- list(
       activation = c("Tanh", "TanhWithDropout"),
       hidden = list(c(20,20),c(30, 30),c(30, 30, 30)),
       input_dropout_ratio = c(0, 0.05),
       rate = c(0.01, 0.25)
     )

You now help specify the random search criteria in a list. Since we want a random search we will specify RandomDiscrete. A full grid search would require Cartesian. It is recommended to specify one or more early stopping criterion for a random search such as max_runtime_secs, max_models. We also specify here that it will stop when the top five models are withing 1% error of each other:

    > search_criteria = list(
       strategy = "RandomDiscrete", max_runtime_secs = 420,
       max_models = 100, seed = 123, stopping_rounds = 5,
       stopping_tolerance = 0.01
     )

Now, this is where the magic should happen using the h2o.grid() function. We tell it, we want to use the deep learning algorithm, our test data, any validation data (we will use the test set), our input features, and response variable:

    > randomSearch <- h2o.grid(
       algorithm = "deeplearning",
       grid_id = "randomSearch",
       training_frame = train,
       validation_frame = test, 
       x = 1:63, 
       y = 64,
       epochs = 1,
       stopping_metric = "misclassification",
       hyper_params = hyper_params,
       search_criteria = search_criteria
     )
 |===================================================================| 100%

An indicator bar tracks the progress, and with this dataset, it should take less than a few seconds.

We now examine the results of the top five models:

    > grid <- h2o.getGrid("randomSearch",sort_by = "auc", decreasing = 
       FALSE)

    > grid
    H2O Grid Details
    ================

    Grid ID: randomSearch 
    Used hyper parameters: 
     - activation 
     - hidden 
     - input_dropout_ratio 
     - rate 
    Number of models: 71 
    Number of failed models: 0 

    Hyper-Parameter Search Summary: ordered by decreasing auc
           activation       hidden input_dropout_ratio rate
    1 TanhWithDropout [30, 30, 30]                0.05 0.25
    2 TanhWithDropout [20, 20]                    0.05 0.01
    3 TanhWithDropout [30, 30, 30]                0.05 0.25
    4 TanhWithDropout [40, 40]                    0.05 0.01
    5 TanhWithDropout [30, 30, 30]                0.0  0.25
                  model_ids                 auc
    1 randomSearch_model_57  0.8636778964667214
    2 randomSearch_model_8   0.8623894823336072
    3 randomSearch_model_10  0.856568611339359
    4 randomSearch_model_39  0.8565258833196385
    5 randomSearch_model_3   0.8544026294165982

So the winning model is #57 with activation of TanhWithDropout, three hidden layers with 30 neurons each, dropout ratio of 0.05, and learning rate of 0.25, which had an AUC of almost 0.864.

We now have a look at our error rates in the validation/test data with a confusion matrix:

    > best_model <- h2o.getModel(grid@model_ids[[1]])
    > h2o.confusionMatrix(best_model, valid = T)
    Confusion Matrix (vertical: actual; across: predicted) for max f1 @ 
      threshold = 0.0953170555399435:
             no yes    Error      Rate
    no     1128  89 0.073131 = 89/1217
    yes      60  65 0.480000 =  60/125
    Totals 1188 154 0.111028 = 149/1342

Even though we only have 11% error, we had high errors for the yes label with high rates of false positives and false negatives. It possibly indicates that class imbalance may be an issue. We also have just started the hyper-parameter tuning process, so much work could be done to improve the outcome. I'll leave that task to you!

Now let's examine how to build a model using cross-validation. Notice how the hyper-parameters are included in the function h2o.deeplearning() with the exception of learning rate, which is specified as adaptive. I also included the functionality to up-sample the minority class to achieve balanced labels during training. On another note, the folds are a stratified sample based on the response variable:

  > dlmodel <- h2o.deeplearning(
      x = 1:63,
      y = 64, 
     training_frame = train,
     hidden = c(30, 30, 30),
     epochs = 3,
     nfolds = 5,
     fold_assignment = "Stratified",
     balance_classes = T,
     activation = "TanhWithDropout",
     seed = 123,
     adaptive_rate = F, 
     input_dropout_ratio = 0.05,
     stopping_metric = "misclassification",
     variable_importances = T
  )

If you call the object dlmodel, you will receive rather lengthy output. In this instance, let's examine the performance on the holdout folds:

    > dlmodel
    Model Details:
    ==============
    AUC:  0.8571054599
    Gini: 0.7142109198

    Confusion Matrix (vertical: actual; across: predicted) for F1-optimal  
      threshold:
             no yes    Error       Rate
    no     2492 291 0.104563 = 291/2783
    yes     160 236 0.404040 =  160/396
    Totals 2652 527 0.141869 = 451/3179

Given these results, I think more tuning is in order for the hyper-parameters, particularly with the hidden layers/neurons. Examining out of sample performance is a little different, but is quite comprehensive, utilizing the h2o.performance() function:

    > perf <- h2o.performance(dlmodel, test)
    > perf
    H2OBinomialMetrics: deeplearning
    MSE:                  0.07237450145
    RMSE:                 0.2690250945
    LogLoss:              0.2399027004
    Mean Per-Class Error: 0.2326113394
    AUC:                  0.8319605588
    Gini:                 0.6639211175

    Confusion Matrix (vertical: actual; across: predicted) for F1-
      optimal 
    threshold:
             no yes    Error      Rate
        no 1050 167 0.137223 = 167/1217
       yes   41  84 0.328000 =  41/125
    Totals 1091 251 0.154993 = 208/1342

    Maximum Metrics: Maximum metrics at their respective thresholds
      metric                      threshold    value idx
    1 max f1                       0.323529 0.446809  62
    2 max f2                       0.297121 0.612245 166
    3 max f0point5                 0.323529 0.372011  62
    4 max accuracy                 0.342544 0.906110   0
    5 max precision                0.323529 0.334661  62
    6 max recall                   0.013764 1.000000 355
    7 max specificity              0.342544 0.999178   0
    8 max absolute_mcc             0.297121 0.411468 166
    9 max min_per_class_accuracy   0.313356 0.799507 131
   10 max mean_per_class_accuracy  0.285007 0.819730 176

The overall error increased, but we have lower false positive and false negative rates. As before, additional tuning is required.

Finally, the variable importance can be produced. This is calculated based on the so-called Gedeon Method. Keep in mind that these results can be misleading. In the table, we can see the order of the variable importance, but this importance is subject to the sampling variation, and if you change the seed value, the order of the variable importance could change quite a bit. These are the top five variables by importance:

    > dlmodel@model$variable_importances
    Variable Importances: 
              variable relative_importance scaled_importance percentage
    1 duration                    1.000000          1.000000   0.147006
    2 poutcome_success            0.806309          0.806309   0.118532
    3 month_oct                   0.329299          0.329299   0.048409
    4 month_mar                   0.223847          0.223847   0.032907
    5 poutcome_failure            0.199272          0.199272   0.029294

With this, we have completed the introduction to deep learning in R using the capabilities of the H2O package. It is simple to use while offering plenty of flexibility to tune the hyperparameters and create deep neural networks. Enjoy!

Table of Contents for Modeling

Create new playlist

Sign In

Sign Up

Table of Contents for
Modeling