Implementing random forest for predicting credit card defaults using H2O

H2O is an open source and distributed machine learning platform that allows you to build machine learning models on large datasets. H2O supports both supervised and unsupervised algorithms and is extremely fast, scalable, and easy to implement. H2O's REST API allows us to access all its functionalities from external programs such as R and Python. H2O in Python is designed to be very similar to scikit-learn. At the time of writing this book, the latest version of H2O is H2O v3.

The reason why H2O brought lightning-fast machine learning to enterprises is given by the following explanation:

"H2O's core code is written in Java. Inside H2O, a distributed key/value store is used to access and reference data, models, objects, and so on, across all nodes and machines. The algorithms are implemented on top of H2O's distributed Map/Reduce framework and utilize the Java fork/join framework for multi-threading. The data is read in parallel and is distributed across the cluster and stored in memory in a columnar format in a compressed way. H2O's data parser has built-in intelligence to guess the schema of the incoming dataset and supports data ingest from multiple sources in various formats"
- from h2o.ai

H2O provides us with distributed random forests, which are a powerful tool used for classification and regression tasks. This generates multiple trees, rather than single trees. In a distributed random forest, we use the average predictions of both the classification and regression models to reach a final result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset