In this chapter, we saw how to develop a machine learning (ML) project using H2O on a bank marketing dataset for predictive analytics. We were able to predict that the client would subscribe to a term deposit with an accuracy of 80%. Furthermore, we saw how to tune typical neural network hyperparameters. Considering the fact that this small-scale dataset, final improvement suggestion would be using Spark based Random Forest, Decision trees or gradient boosted trees for better accuracy.
In the next chapter, we will use a dataset having more than 284,807 instances of credit card use, where only 0.172% of transactions are fraudulent—that is, highly unbalanced data. So it would make sense to use autoencoders to pretrain a classification model and apply anomaly detection to predict possible fraud transaction—that is, we expect our fraud cases to be anomalies within the whole dataset.