Step 6 - Unsupervised pre-training using autoencoder

As described earlier, we will be using Scala with the h2o encoder. Now it's time to start the unsupervised autoencoder training. Since the training is unsupervised, it means we need to exclude the response column from the unsupervised training set:

val response = "Class"
val features = train_unsupervised.names.filterNot(_ == response)

The next task is to define the hyperparameters, such as the number of hidden layers with neurons, seeds for the reproducibility, the number of training epochs and the activation function for the deep learning model. For the unsupervised pre-training, just set the autoencoder parameter to true:

var dlParams = new DeepLearningParameters()
    dlParams._ignored_columns = Array(response))// since unsupervised, we ignore the label
    dlParams._train = train_unsupervised._key // use the train_unsupervised frame for training
    dlParams._autoencoder = true // use H2O built-in autoencoder
    dlParams._reproducible = true // ensure reproducibility
    dlParams._seed = 42 // random seed for reproducibility
    dlParams._hidden = Array[Int](10, 2, 10)
    dlParams._epochs = 100 // number of training epochs
    dlParams._activation = Activation.Tanh // Tanh as an activation function
    dlParams._force_load_balance = false

var dl = new DeepLearning(dlParams)
val model_nn = dl.trainModel.get

In the preceding code, we are applying a technique called bottleneck training, where the hidden layer in the middle is very small. This means that my model will have to reduce the dimensionality of the input data (in this case, down to two nodes/dimensions).

The autoencoder model will then learn the patterns of the input data, irrespective of given class labels. Here, it will learn which credit card transactions are similar and which transactions are outliers or anomalies. We need to keep in mind, though, that autoencoder models will be sensitive to outliers in our data, which might throw off otherwise typical patterns.

Once the pre-training is completed, we should save the model in the .csv directory:

val uri = new File(new File(inputCSV).getParentFile, "model_nn.bin").toURI ModelSerializationSupport.exportH2OModel(model_nn, uri)

Reload the model and restore it for further use:

val model: DeepLearningModel = ModelSerializationSupport.loadH2OModel(uri)

Now let's print the model's metrics to see how the training went:

println(model)
>>>

Figure 13: Autoencoder model's metrics

Fantastic! The pre-training went very well, because we can see the RMSE and MSE are pretty low. We can also see that some features are pretty unimportant, such as v16, v1, v25, and so on. We will try to analyze it later on.

Table of Contents for Step 6 - Unsupervised pre-training using autoencoder

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 6 - Unsupervised pre-training using autoencoder