Gradient tree boosting (GTB)

Gradient boosting is another improved version of boosting. Like AdaBoost, it is based on a gradient descent function. The algorithm has proven to be one of the most proficient ones from the ensemble, though it is characterized by an increased variance of estimates, more sensibility to noise in data (both problems could be attenuated by using sub-sampling), and significant computational costs due to nonparallel operations.

To demonstrate how GTB performs, we will again try checking whether we can improve our predictive performance on the covertype dataset, which was already examined when illustrating linear SVM and ensemble algorithms:

In: import pickle
covertype_dataset = pickle.load(open("covertype_dataset.pickle", "rb"))
covertype_X = covertype_dataset.data[:15000,:]
covertype_y = covertype_dataset.target[:15000] -1
covertype_val_X = covertype_dataset.data[15000:20000,:]
covertype_val_y = covertype_dataset.target[15000:20000] -1
covertype_test_X = covertype_dataset.data[20000:25000,:]
covertype_test_y = covertype_dataset.target[20000:25000] -1

After loading the data, the training sample size is limited to 15000 observations to achieve a reasonable training performance. We also extract a validation sample made of 5,000 examples and a test sample made of another 5,000 cases. We now proceed to train our model:

In: import numpy as np
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.ensemble import GradientBoostingClassifier
hypothesis = GradientBoostingClassifier(max_depth=5,
n_estimators=50,
random_state=101)
hypothesis.fit(covertype_X, covertype_y)

In: from sklearn.metrics import accuracy_score
print ("GradientBoostingClassifier -> test accuracy:",
accuracy_score(covertype_test_y,
hypothesis.predict(covertype_test_X)))

Out: GradientBoostingClassifier -> test accuracy: 0.8202

To obtain the best performance from GradientBoostingClassifier and GradientBoostingRegression, you have to tweak the following:

  • n_estimators: Exceeding with estimators, it increases variance. Anyway, if the estimators are not enough, the algorithm will suffer from high bias. The right number cannot be known a-priori and has to be found heuristically, by testing various configurations by cross-validation.
  • max_depth: It increases the variance and complexity.
  • subsample: It can effectively reduce variance of the estimates using values from 0.9 to 0.7.
  • learning_rate: Smaller values can improve optimization in the training process, though that will require more estimators to converge, and thus more computational time.
  • in_samples_leaf: This can reduce the variance due to noisy data, reserving overfitting to rare cases.

Apart from deep learning, gradient boosting is actually the most developed machine learning algorithm. Since Adaboost and the following Gradient Boosting implementation as developed by Jerome Friedman, there appeared various implementations of the algorithms, the most recent ones being XGBoost, LightGBM, and CatBoost. In the following paragraphs, we will be exploring these new solutions and testing them on the road using the Forest Covertype data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset