AdaBoost classifier

Boosting is another state-of-the-art model that is being used by many data scientists to win so many competitions. In this section, we will be covering the AdaBoost algorithm, followed by gradient boost and extreme gradient boost (XGBoost). Boosting is a general approach that can be applied to many statistical models. However, in this book, we will be discussing the application of boosting in the context of decision trees. In bagging, we have taken multiple samples from the training data and then combined the results of individual trees to create a single predictive model; this method runs in parallel, as each bootstrap sample does not depend on others. Boosting works in a sequential manner and does not involve bootstrap sampling; instead, each tree is fitted on a modified version of an original dataset and finally added up to create a strong classifier:

The preceding figure is the sample methodology on how AdaBoost works. We will cover step-by-step procedures in detail in the following algorithm description. Initially, a simple classifier has been fitted on the data (also called a decision stump, which splits the data into just two regions) and whatever the classes correctly classified will be given less weightage in the next iteration (iteration 2) and higher weightage for misclassified classes (observer + blue icons), and again another decision stump/weak classifier will be fitted on the data and will change the weights again for the next iteration (iteration 3, here check the - symbols for which weight has been increased). Once it finishes the iterations, these are combined with weights (weights automatically calculated for each classifier at each iteration based on error rate) to come up with a strong classifier, which predicts the classes with surprising accuracy.

Algorithm for AdaBoost consists of the following steps:

Initialize the observation weights w_i = 1/N, i=1, 2, …, N. Where N = Number of observations.
For m = 1 to M:
- Fit a classifier Gm(x) to the training data using weights w_i
- Compute:

- Compute:

- Set:

Output:

All the observations are given equal weight.

In bagging and random forest algorithms, we deal with the columns of the data; whereas, in boosting, we adjust the weights of each observation and don't elect a few columns.

We fit a classifier on the data and evaluate overall errors. The error used for calculating weight should be given for that classifier in the final additive model (α) evaluation. The intuitive sense is that the higher weight will be given for the model with fewer errors. Finally, weights for each observation will be updated. Here, weight will be increased for incorrectly classified observations in order to give more focus to the next iterations, and weights will be reduced for correctly classified observations.

All the weak classifiers are combined with their respective weights to form a strong classifier. In the following figure, a quick idea is shared on how weights changed in the last iteration compared with the initial iteration:

# Adaboost Classifier 
>>> from sklearn.tree import DecisionTreeClassifier 
>>> from sklearn.ensemble import AdaBoostClassifier

Decision stump is used as a base classifier for AdaBoost. If we observe the following code, the depth of the tree remains as 1, which has decision taking ability only once (also considered a weak classifier):

>>> dtree = DecisionTreeClassifier(criterion='gini',max_depth=1)

In AdaBoost, decision stump has been used as a base estimator to fit on whole datasets and then fits additional copies of the classifier on the same dataset up to 5000 times. The learning rate shrinks the contribution of each classifier by 0.05. There is a trade-off between the learning rate and number of estimators. By carefully choosing a low learning rate and a long number of estimators, one can converge optimum very much, however at the expense of computing power:

>>>adabst_fit = AdaBoostClassifier(base_estimator= dtree,n_estimators=5000,learning_rate=0.05,random_state=42)

>>>adabst_fit.fit(x_train, y_train)
>>>print ("
AdaBoost - Train Confusion Matrix

", pd.crosstab(y_train, adabst_fit.predict(x_train), rownames = ["Actuall"],colnames = ["Predicted"]))
>>>print ("
AdaBoost - Train accuracy",round(accuracy_score(y_train,adabst_fit.predict(x_train)), 3))
>>>print ("
AdaBoost  - Train Classification Report
",classification_report(y_train,adabst_fit.predict(x_train)))

The result of the AdaBoost seems to be much better than the known best random forest classifiers in terms of the recall of 1 value. Though there is a slight decrease in accuracy to 86.8% compared with the best accuracy of 87.8%, the number of 1's predicted is 23 from the RF, which is 14 with some expense of an increase in 0's, but it really made good progress in terms of identifying actual attriters:

R Code for AdaBoost classifier applied on HR attrition data:

# Adaboost classifier using   C5.0 with trails included for boosting   
library(C50)   
class_zero_wgt = 0.3   
class_one_wgt = 1-class_zero_wgt   
cstvr =   class_one_wgt/class_zero_wgt   
error_cost <- matrix(c(0, 1,   cstvr, 0), nrow = 2)   
# Fitting Adaboost model     
ada_fit = C5.0(train_data[-31],train_data$Attrition_ind,costs   = error_cost, trails = 5000,control = C5.0Control(minCases = 1))   
summary(ada_fit)   
   
tr_y_pred = predict(ada_fit,   train_data,type = "class")   
ts_y_pred =   predict(ada_fit,test_data,type = "class")   
   
tr_y_act =   train_data$Attrition_ind;ts_y_act = test_data$Attrition_ind   
   
tr_tble = table(tr_y_act,tr_y_pred)   
print(paste("AdaBoost -   Train Confusion Matrix"))   
print(tr_tble)   
tr_acc =   accrcy(tr_y_act,tr_y_pred)   
trprec_zero =   prec_zero(tr_y_act,tr_y_pred); trrecl_zero = recl_zero(tr_y_act,tr_y_pred)   
trprec_one =   prec_one(tr_y_act,tr_y_pred); trrecl_one = recl_one(tr_y_act,tr_y_pred)   
trprec_ovll = trprec_zero   *frac_trzero + trprec_one*frac_trone   
trrecl_ovll = trrecl_zero   *frac_trzero + trrecl_one*frac_trone   
print(paste("AdaBoost   Train accuracy:",tr_acc))   
print(paste("AdaBoost -   Train Classification Report"))   
print(paste("Zero_Precision",trprec_zero,"Zero_Recall",trrecl_zero))   
print(paste("One_Precision",trprec_one,"One_Recall",trrecl_one))   
print(paste("Overall_Precision",round(trprec_ovll,4),"Overall_Recall",round(trrecl_ovll,4)))   
   
ts_tble =   table(ts_y_act,ts_y_pred)   
print(paste("AdaBoost -   Test Confusion Matrix"))   
print(ts_tble)   
   
ts_acc =   accrcy(ts_y_act,ts_y_pred)   
tsprec_zero =   prec_zero(ts_y_act,ts_y_pred); tsrecl_zero = recl_zero(ts_y_act,ts_y_pred)   
tsprec_one =   prec_one(ts_y_act,ts_y_pred); tsrecl_one = recl_one(ts_y_act,ts_y_pred)   
   
tsprec_ovll = tsprec_zero   *frac_tszero + tsprec_one*frac_tsone   
tsrecl_ovll = tsrecl_zero   *frac_tszero + tsrecl_one*frac_tsone   
   
print(paste("AdaBoost Test   accuracy:",ts_acc))   
print(paste("AdaBoost -   Test Classification Report"))   
print(paste("Zero_Precision",tsprec_zero,"Zero_Recall",tsrecl_zero))   
print(paste("One_Precision",tsprec_one,"One_Recall",tsrecl_one))   
print(paste("Overall_Precision",round(tsprec_ovll,4),"Overall_Recall",round(tsrecl_ovll,4)))

Table of Contents for AdaBoost classifier

Create new playlist

Sign In

Sign Up

Table of Contents for
AdaBoost classifier