Step 9 - Pre-trained supervised model

We can now try using the autoencoder model as a pre-training input for a supervised model. Here, I am again using a neural network. This model will now use the weights from the autoencoder for model fitting. However, transforming the classes from Int to Categorical in order to train for classification is necessary. Otherwise, the H2O training algorithm will treat it as a regression:

toCategorical(train_supervised, 29)

Now that the training set (that is, train_supervised) is ready for supervised learning, let's jump into it:

val train_supervised_H2O = asH2OFrame(train_supervised)
dlParams = new DeepLearningParameters()
dlParams._pretrained_autoencoder = model_nn._key
dlParams._train = train_supervised_H2O
dlParams._reproducible = true
dlParams._ignore_const_cols = false
dlParams._seed = 42
dlParams._hidden = Array[Int](10, 2, 10)
dlParams._epochs = 100
dlParams._activation = Activation.Tanh
dlParams._response_column = "Class"
dlParams._balance_classes = true

dl = new DeepLearning(dlParams)
val model_nn_2 = dl.trainModel.get

Well done! We have now completed the supervised training. Now, to see the predicted versus actual classes:

val predictions = model_nn_2.score(test, "predict")
test.add("predict", predictions.vec("predict"))
asDataFrame(test).groupBy("Class", "predict").count.show //print
>>>
+-----+-------+-----+
|Class|predict|count|
+-----+-------+-----+
| 1| 0| 19|
| 0| 1| 57|
| 0| 0|56804|
| 1| 1| 83|
+-----+-------+-----+

Now, this looks much better! We did miss 17% of the fraud cases, but we also did not misclassify too many of the non-fraudulent cases. In real life, we would spend some more time trying to improve the model by example, performing grid searches for hyperparameter tuning, going back to the original features and trying different engineered features and/or trying different algorithms. Now, what about visualizing the preceding result? Let's do it using the Vegas package:

Vegas().withDataFrame(asDataFrame(test)).mark(Bar).encodeY(field = "*", dataType = Quantitative, AggOps.Count, axis = Axis(title = "", format = ".2f"), hideAxis = true).encodeX("Class", Ord).encodeColor("predict", Nominal, scale = Scale(rangeNominals = List("#EA98D2", "#659CCA"))).configMark(stacked = StackOffset.Normalize).show
>>>
Figure 17: Predicted versus actual classes using the supervised trained model
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset