How to train and tune GBM models

The two key drivers of gradient boosting performance are the size of the ensemble and the complexity of its constituent decision trees.

The control of complexity for decision trees aims to avoid learning highly specific rules that typically imply a very small number of samples in leaf nodes. We covered the most effective constraints used to limit the ability of a decision tree to overfit to the training data in the previous chapter. They include requiring:

  • A minimum number of samples to either split a node or accept it as a terminal node, or
  • A minimum improvement in node quality as measured by the purity or entropy or mean square error, in the case of regression.

In addition to directly controlling the size of the ensemble, there are various regularization techniques, such as shrinkage, that we encountered in the context of the Ridge and Lasso linear regression models in Chapter 7, Linear Models. Furthermore, the randomization techniques used in the context of random forests are also commonly applied to gradient boosting machines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset