Modeling

As we will see, the deep learning function has quite a few arguments and parameters that you can tune. The thing that I like about the package is the ability to keep it as simple as possible and let the defaults do their thing. If you want to see all the possibilities along with the defaults, see help or run the following command:

    > args(h2o.deeplearning)

Documentation on all the arguments and tuning parameters is available online at http://h2o.ai/docs/master/model/deep-learning/.

As on a side note, you can run a demo for the various machine learning methods by just running demo("method"). For instance, you can go through the deep learning demo with demo(h2o.deeplearning).

Our next goal is to tune the hyper-parameters using a random search. It takes less time than a full grid search. We will look at tanh, with and without dropout, three different hidden layer/neuron combinations, two different dropout ratios, and two different learning rates:

    > hyper_params <- list(
activation = c("Tanh", "TanhWithDropout"),
hidden = list(c(20,20),c(30, 30),c(30, 30, 30)),
input_dropout_ratio = c(0, 0.05),
rate = c(0.01, 0.25)
)

You now help specify the random search criteria in a list. Since we want a random search we will specify RandomDiscrete. A full grid search would require Cartesian. It is recommended to specify one or more early stopping criterion for a random search such as max_runtime_secs, max_models. We also specify here that it will stop when the top five models are withing 1% error of each other:

    > search_criteria = list(
strategy = "RandomDiscrete", max_runtime_secs = 420,
max_models = 100, seed = 123, stopping_rounds = 5,
stopping_tolerance = 0.01
)

Now, this is where the magic should happen using the h2o.grid() function. We tell it, we want to use the deep learning algorithm, our test data, any validation data (we will use the test set), our input features, and response variable:

    > randomSearch <- h2o.grid(
algorithm = "deeplearning",
grid_id = "randomSearch",
training_frame = train,
validation_frame = test,
x = 1:63,
y = 64,
epochs = 1,
stopping_metric = "misclassification",
hyper_params = hyper_params,
search_criteria = search_criteria
)
|===================================================================| 100%

An indicator bar tracks the progress, and with this dataset, it should take less than a few seconds.

We now examine the results of the top five models:

    > grid <- h2o.getGrid("randomSearch",sort_by = "auc", decreasing = 
FALSE)


> grid

H2O Grid Details
================

Grid ID: randomSearch
Used hyper parameters:
- activation
- hidden
- input_dropout_ratio
- rate
Number of models: 71
Number of failed models: 0

Hyper-Parameter Search Summary: ordered by decreasing auc
activation hidden input_dropout_ratio rate
1 TanhWithDropout [30, 30, 30] 0.05 0.25
2 TanhWithDropout [20, 20] 0.05 0.01
3 TanhWithDropout [30, 30, 30] 0.05 0.25
4 TanhWithDropout [40, 40] 0.05 0.01
5 TanhWithDropout [30, 30, 30] 0.0 0.25
model_ids auc
1 randomSearch_model_57 0.8636778964667214
2 randomSearch_model_8 0.8623894823336072
3 randomSearch_model_10 0.856568611339359
4 randomSearch_model_39 0.8565258833196385
5 randomSearch_model_3 0.8544026294165982

So the winning model is #57 with activation of TanhWithDropout, three hidden layers with 30 neurons each, dropout ratio of 0.05, and learning rate of 0.25, which had an AUC of almost 0.864.  

We now have a look at our error rates in the validation/test data with a confusion matrix:

    > best_model <- h2o.getModel(grid@model_ids[[1]])
> h2o.confusionMatrix(best_model, valid = T)
Confusion Matrix (vertical: actual; across: predicted) for max f1 @
threshold = 0.0953170555399435:

no yes Error Rate
no 1128 89 0.073131 = 89/1217
yes 60 65 0.480000 = 60/125
Totals 1188 154 0.111028 = 149/1342

Even though we only have 11% error, we had high errors for the yes label with high rates of false positives and false negatives. It possibly indicates that class imbalance may be an issue. We also have just started the hyper-parameter tuning process, so much work could be done to improve the outcome. I'll leave that task to you!

Now let's examine how to build a model using cross-validation. Notice how the hyper-parameters are included in the function h2o.deeplearning() with the exception of learning rate, which is specified as adaptive. I also included the functionality to up-sample the minority class to achieve balanced labels during training. On another note, the folds are a stratified sample based on the response variable:

  > dlmodel <- h2o.deeplearning(
x = 1:63,
y = 64,
training_frame = train,
hidden = c(30, 30, 30),
epochs = 3,
nfolds = 5,
fold_assignment = "Stratified",
balance_classes = T,
activation = "TanhWithDropout",
seed = 123,
adaptive_rate = F,
input_dropout_ratio = 0.05,
stopping_metric = "misclassification",
variable_importances = T
)

If you call the object dlmodel, you will receive rather lengthy output. In this instance, let's examine the performance on the holdout folds:

    > dlmodel
Model Details:
==============
AUC: 0.8571054599
Gini: 0.7142109198

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal
threshold:
no yes Error Rate
no 2492 291 0.104563 = 291/2783
yes 160 236 0.404040 = 160/396
Totals 2652 527 0.141869 = 451/3179

Given these results, I think more tuning is in order for the hyper-parameters, particularly with the hidden layers/neurons. Examining out of sample performance is a little different, but is quite comprehensive, utilizing the h2o.performance() function:

    > perf <- h2o.performance(dlmodel, test)
> perf
H2OBinomialMetrics: deeplearning
MSE: 0.07237450145
RMSE: 0.2690250945
LogLoss: 0.2399027004
Mean Per-Class Error: 0.2326113394
AUC: 0.8319605588
Gini: 0.6639211175

Confusion Matrix (vertical: actual; across: predicted) for F1-
optimal
threshold:

no yes Error Rate
no 1050 167 0.137223 = 167/1217
yes 41 84 0.328000 = 41/125
Totals 1091 251 0.154993 = 208/1342

Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.323529 0.446809 62
2 max f2 0.297121 0.612245 166
3 max f0point5 0.323529 0.372011 62
4 max accuracy 0.342544 0.906110 0
5 max precision 0.323529 0.334661 62
6 max recall 0.013764 1.000000 355
7 max specificity 0.342544 0.999178 0
8 max absolute_mcc 0.297121 0.411468 166
9 max min_per_class_accuracy 0.313356 0.799507 131
10 max mean_per_class_accuracy 0.285007 0.819730 176

The overall error increased, but we have lower false positive and false negative rates. As before, additional tuning is required.

Finally, the variable importance can be produced. This is calculated based on the so-called Gedeon Method. Keep in mind that these results can be misleading. In the table, we can see the order of the variable importance, but this importance is subject to the sampling variation, and if you change the seed value, the order of the variable importance could change quite a bit. These are the top five variables by importance:

    > dlmodel@model$variable_importances
Variable Importances:
variable relative_importance scaled_importance percentage
1 duration 1.000000 1.000000 0.147006
2 poutcome_success 0.806309 0.806309 0.118532
3 month_oct 0.329299 0.329299 0.048409
4 month_mar 0.223847 0.223847 0.032907
5 poutcome_failure 0.199272 0.199272 0.029294

With this, we have completed the introduction to deep learning in R using the capabilities of the H2O package. It is simple to use while offering plenty of flexibility to tune the hyperparameters and create deep neural networks. Enjoy!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset