How to do it...

We will now look at how to use an AdaBoost to train our model:

  1. Before we build our first AdaBoost model, let's train our model using the DecisionTreeClassifier:
dtree = DecisionTreeClassifier(max_depth=3, random_state=0)
dtree.fit(X_train, Y_train)
  1. We can see our accuracy and Area Under the Curve (AUC) with the following code:
# Mean accuracy
print('The mean accuracy is: ',(dtree.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_dtree = dtree.predict_proba(X_test)
fpr_dtree, tpr_dtree, thresholds = roc_curve(Y_test, y_pred_dtree[:,1])
auc_dtree = auc(fpr_dtree, tpr_dtree)
print ('AUC Value: ', auc_dtree)

We get an accuracy score and an AUC value of 91.81% and 0.91, respectively. Note that these values might be different for different users due to randomness.

  1. Now, we will build our AdaBoost model using the scikit-learn library. We will use the AdaBoostClassifier to build our AdaBoost model. AdaBoost uses dtree as the base classifier by default:
AdaBoost = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.1, random_state=0)
AdaBoost.fit(X_train, Y_train)
  1. We check the accuracy and AUC value of the model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_adaboost = AdaBoost.predict_proba(X_test)
fpr_ab, tpr_ab, thresholds = roc_curve(Y_test, y_pred_adaboost[:,1])
auc_adaboost = auc(fpr_ab, tpr_ab)
print ('AUC Value: ', auc_adaboost)

We notice that we get an accuracy score of 92.82% and an AUC value of 0.97. Both of these metrics are higher than the decision tree model we built in Step 1.

  1. Then, we must fine-tune our hyperparameters. We set n_estimators to 100 and learning_rate to 0.4:
# Tuning the hyperparams
AdaBoost_with_tuning = AdaBoostClassifier(n_estimators=100, base_estimator=dtree, learning_rate=0.4, random_state=0)
AdaBoost_with_tuning.fit(X_train, Y_train)
  1. Now, we will check the accuracy and AUC values of our new model on our test data:
# Mean accuracy
print('The mean accuracy is: ',(AdaBoost_with_tuning.score(X_test,Y_test))*100,'%')

#AUC score
y_pred_adaboost_tune = AdaBoost.predict_proba(X_test)
fpr_ab_tune, tpr_ab_tune, thresholds = roc_curve(Y_test, y_pred_adaboost_tune[:,1])
auc_adaboost_tune = auc(fpr_ab_tune, tpr_ab_tune)
print ('AUC Value: ', auc_adaboost_tune)

We notice the accuracy drops to 92.39%, but that we get an improved AUC value of 0.98.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset