Step 9 - Pre-trained supervised model

We can now try using the autoencoder model as a pre-training input for a supervised model. Here, I am again using a neural network. This model will now use the weights from the autoencoder for model fitting. However, transforming the classes from Int to Categorical in order to train for classification is necessary. Otherwise, the H2O training algorithm will treat it as a regression:

toCategorical(train_supervised, 29)

Now that the training set (that is, train_supervised) is ready for supervised learning, let's jump into it:

val train_supervised_H2O = asH2OFrame(train_supervised)
        dlParams = new DeepLearningParameters()
        dlParams._pretrained_autoencoder = model_nn._key
        dlParams._train = train_supervised_H2O
        dlParams._reproducible = true
        dlParams._ignore_const_cols = false
        dlParams._seed = 42
        dlParams._hidden = Array[Int](10, 2, 10)
        dlParams._epochs = 100
        dlParams._activation = Activation.Tanh
        dlParams._response_column = "Class"
        dlParams._balance_classes = true

dl = new DeepLearning(dlParams)
val model_nn_2 = dl.trainModel.get

Well done! We have now completed the supervised training. Now, to see the predicted versus actual classes:

val predictions = model_nn_2.score(test, "predict")
test.add("predict", predictions.vec("predict"))
asDataFrame(test).groupBy("Class", "predict").count.show //print
>>>
+-----+-------+-----+
|Class|predict|count|
+-----+-------+-----+
| 1| 0| 19|
| 0| 1| 57|
| 0| 0|56804|
| 1| 1| 83|
+-----+-------+-----+

Now, this looks much better! We did miss 17% of the fraud cases, but we also did not misclassify too many of the non-fraudulent cases. In real life, we would spend some more time trying to improve the model by example, performing grid searches for hyperparameter tuning, going back to the original features and trying different engineered features and/or trying different algorithms. Now, what about visualizing the preceding result? Let's do it using the Vegas package:

Vegas().withDataFrame(asDataFrame(test)).mark(Bar).encodeY(field = "*", dataType = Quantitative, AggOps.Count, axis = Axis(title = "", format = ".2f"), hideAxis = true).encodeX("Class", Ord).encodeColor("predict", Nominal, scale = Scale(rangeNominals = List("#EA98D2", "#659CCA"))).configMark(stacked = StackOffset.Normalize).show
>>>

Figure 17: Predicted versus actual classes using the supervised trained model

Table of Contents for Step 9 - Pre-trained supervised model

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 9 - Pre-trained supervised model