Summary

In this chapter, you learned both the power and limitations of tree-based learning methods for both the classification and regression problems. Single trees, while easy to build and interpret, may not have the necessary predictive power for many of the problems that we are trying to solve. To improve on the predictive ability, we have the tools of random forest and gradient boosted trees at our disposal. With random forest, dozens or hundreds of trees are built and the results aggregated for an overall prediction. Each tree of the random forest is built using a sample of the data called bootstrapping as well as a sample of the predictive variables. As for gradient boosting, an initial, and a relatively small, tree is produced. After this initial tree is built, subsequent trees are produced based on the residuals. The intended result of such a technique is to build a series of trees that can improve on the weakness of the prior tree in the process, resulting in decreased bias and variance.

While these methods are indeed extremely powerful, they are not some sort of nostrum in the world of machine learning. Different datasets require judgment on the part of the analyst as to which techniques are applicable. The techniques to be applied to the analysis and the selection of the tuning parameters are equally important. This fine tuning can make all the difference between a good predictive model and a great predictive model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset