We will now look at how to use an AdaBoost to train our model:
- Before we build our first AdaBoost model, let's train our model using the DecisionTreeClassifier:
dtree = DecisionTreeClassifier(max_depth=3, random_state=0)
dtree.fit(X_train, Y_train)
- We can see our accuracy and Area Under the Curve (AUC) with the following code:
# Mean accuracy
print('The mean accuracy is: ',(dtree.score(X_test,Y_test))*100,'%')
#AUC score
y_pred_dtree = dtree.predict_proba(X_test)
fpr_dtree, tpr_dtree, thresholds = roc_curve(Y_test, y_pred_dtree[:,1])
auc_dtree = auc(fpr_dtree, tpr_dtree)
print ('AUC Value: ', auc_dtree)
We get an accuracy score and an AUC value of 91.81% and 0.91, respectively. Note that these values might be different for different users due to randomness.
- Now, we will build our AdaBoost model using the scikit-learn library. We will use the AdaBoostClassifier to build our AdaBoost model. AdaBoost uses dtree as the base classifier by default:
AdaBoost = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.1, random_state=0)
AdaBoost.fit(X_train, Y_train)
- We check the accuracy and AUC value of the model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost.score(X_test,Y_test))*100,'%')
#AUC score
y_pred_adaboost = AdaBoost.predict_proba(X_test)
fpr_ab, tpr_ab, thresholds = roc_curve(Y_test, y_pred_adaboost[:,1])
auc_adaboost = auc(fpr_ab, tpr_ab)
print ('AUC Value: ', auc_adaboost)
We notice that we get an accuracy score of 92.82% and an AUC value of 0.97. Both of these metrics are higher than the decision tree model we built in Step 1.
- Then, we must fine-tune our hyperparameters. We set n_estimators to 100 and learning_rate to 0.4:
# Tuning the hyperparams
AdaBoost_with_tuning = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.4, random_state=0)
AdaBoost_with_tuning.fit(X_train, Y_train)
- Now, we will check the accuracy and AUC values of our new model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost_with_tuning.score(X_test,Y_test))*100,'%')
#AUC score
y_pred_adaboost_tune = AdaBoost.predict_proba(X_test)
fpr_ab_tune, tpr_ab_tune, thresholds = roc_curve(Y_test, y_pred_adaboost_tune[:,1])
auc_adaboost_tune = auc(fpr_ab_tune, tpr_ab_tune)
print ('AUC Value: ', auc_adaboost_tune)
We notice the accuracy drops to 92.39%, but that we get an improved AUC value of 0.98.