Chapter 9. Growing Trees

In this chapter, we will cover the following recipes:

  • Going from trees to forest – Random Forest
  • Growing extremely randomized Trees
  • Growing a rotation forest

Introduction

In this chapter, we will see some more Bagging methods based on tree-based algorithms. Due to their robustness against noise and universal applicability to a variety of problems, they are very popular among the data science community.

The claim to fame for most of these methods is that they can obtain very good results with zero data preparation compared to other methods, and they can be provided as black box tools in the hands of software engineers.

Other than the tall claims made in the previous paragraphs, there are some other advantages as well.

By design, bagging lends itself nicely to parallelization. Hence, these methods can be easily applied on a very large dataset in a cluster environment.

Decision Tree algorithms split the input data into various regions at each level of the tree. Thus, they perform implicit feature selection. Feature selection is one of the most important tasks in building a good model. By providing implicit feature selection, Decision Trees are in an advantageous position compared to other techniques. Hence, Bagging with Decision Trees comes with this advantage.

Almost no data preparation is needed for decision trees. For example, consider scaling of attributes. The attribute scale has no impact on the structure of the decision trees. Moreover, missing values do not affect decision trees. The effect of outliers too is very minimal on a Decision Tree.

In some of our earlier recipes, we had used Polynomial features retaining only the interaction components. With an ensemble of trees, these interactions are taken care of. We don't have to make explicit feature transformations to accommodate feature interactions.

Linear Regression-based models fail in the case of the existence of a non-linear relationship in the input data. We saw this effect when we explained the Kernel PCA recipes. Tree-based algorithms are not affected by a non-linear relationship in the data.

One of the major complaints against the Tree-based method is the difficulty with pruning of trees to avoid overfitting. Big trees tend to fit the noise present in the underlying data as well, and hence, lead to a low bias and high variance. However, when we grow a lot of trees, and the final prediction is an average of the output of all the trees in the ensemble, we avoid the problem of variance.

In this chapter, we will see three tree-based ensemble methods.

Our first recipe is about implementing Random Forests for a classification problem. Leo Breiman is the inventor of this algorithm. The Random Forest is an ensemble technique which leverages a lot of trees internally to produce a model for solving any regression or classification problems.

Our second recipe is about Extremely Randomized trees, an algorithm which varies in a very small way from Random Forests. By introducing more randomization in its procedure as compared to a Random Forest, it claims to address the variance problem more effectively. Moreover, it has a slightly reduced computational complexity.

Our final recipe is about Rotation Forests. The first two recipes require a large number of trees to be a part of their ensemble for achieving good performance. Rotation forest claim that they can achieve similar or better performance with a fewer number of trees. Furthermore, the authors of this algorithm claim that the underlying estimator can be anything other than a tree. In this way, it is projected as a new framework for building an ensemble similar to Gradient Boosting.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset