How to do it...

We will now look at how to use an AdaBoost to train our model:

  1. Before we build our first AdaBoost model, let's train our model using the DecisionTreeClassifier:
dtree = DecisionTreeClassifier(max_depth=3, random_state=0), Y_train)
  1. We can see our accuracy and Area Under the Curve (AUC) with the following code:
# Mean accuracy
print('The mean accuracy is: ',(dtree.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_dtree = dtree.predict_proba(X_test)
fpr_dtree, tpr_dtree, thresholds = roc_curve(Y_test, y_pred_dtree[:,1])
auc_dtree = auc(fpr_dtree, tpr_dtree)
print ('AUC Value: ', auc_dtree)

We get an accuracy score and an AUC value of 91.81% and 0.91, respectively. Note that these values might be different for different users due to randomness.

  1. Now, we will build our AdaBoost model using the scikit-learn library. We will use the AdaBoostClassifier to build our AdaBoost model. AdaBoost uses dtree as the base classifier by default:
AdaBoost = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.1, random_state=0), Y_train)
  1. We check the accuracy and AUC value of the model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_adaboost = AdaBoost.predict_proba(X_test)
fpr_ab, tpr_ab, thresholds = roc_curve(Y_test, y_pred_adaboost[:,1])
auc_adaboost = auc(fpr_ab, tpr_ab)
print ('AUC Value: ', auc_adaboost)

We notice that we get an accuracy score of 92.82% and an AUC value of 0.97. Both of these metrics are higher than the decision tree model we built in Step 1.

  1. Then, we must fine-tune our hyperparameters. We set n_estimators to 100 and learning_rate to 0.4:
# Tuning the hyperparams
AdaBoost_with_tuning = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.4, random_state=0), Y_train)
  1. Now, we will check the accuracy and AUC values of our new model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost_with_tuning.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_adaboost_tune = AdaBoost.predict_proba(X_test)
fpr_ab_tune, tpr_ab_tune, thresholds = roc_curve(Y_test, y_pred_adaboost_tune[:,1])
auc_adaboost_tune = auc(fpr_ab_tune, tpr_ab_tune)
print ('AUC Value: ', auc_adaboost_tune)

We notice the accuracy drops to 92.39%, but that we get an improved AUC value of 0.98.

