Ensemble of ensembles with bootstrap samples using a single type of classifier

In this methodology, bootstrap samples are drawn from training data and, each time, separate models will be fitted (individual models could be decision trees, random forest, and so on) on the drawn sample, and all these results are combined at the end to create an ensemble. This method suits dealing with highly flexible models where variance reduction will still improve performance:

In the following example, AdaBoost is used as a base classifier and the results of individual AdaBoost models are combined using the bagging classifier to generate final outcomes. Nonetheless, each AdaBoost is made up of decision trees with a depth of 1 (decision stumps). Here, we would like to show that classifier inside classifier inside classifier is possible (sounds like the Inception movie though!):

# Ensemble of Ensembles - by applying bagging on simple classifier 
>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.ensemble import AdaBoostClassifier
>>> clwght = {0:0.3,1:0.7}

The following is the base classifier (decision stump) used in the AdaBoost classifier:

>>> eoe_dtree = DecisionTreeClassifier(criterion='gini',max_depth=1,class_weight = clwght)

Each AdaBoost classifier consists of 500 decision trees with a learning rate of 0.05:

>>> eoe_adabst_fit = AdaBoostClassifier(base_estimator= eoe_dtree, n_estimators=500,learning_rate=0.05,random_state=42) 
>>> eoe_adabst_fit.fit(x_train, y_train)

>>> print (" AdaBoost - Train Confusion Matrix ",pd.crosstab(y_train, eoe_adabst_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"]))
>>> print (" AdaBoost - Train accuracy",round(accuracy_score(y_train, eoe_adabst_fit.predict(x_train)),3))
>>> print (" AdaBoost - Train Classification Report ",classification_report(y_train, eoe_adabst_fit.predict(x_train)))

>>> print (" AdaBoost - Test Confusion Matrix ",pd.crosstab(y_test, eoe_adabst_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"]))
>>> print (" AdaBoost - Test accuracy",round(accuracy_score(y_test, eoe_adabst_fit.predict(x_test)),3))
>>> print (" AdaBoost - Test Classification Report ",classification_report(y_test, eoe_adabst_fit.predict(x_test)))

The bagging classifier consists of 50 AdaBoost classifiers to ensemble the ensembles:

>>> bag_fit = BaggingClassifier(base_estimator= eoe_adabst_fit,n_estimators=50,
max_samples=1.0,max_features=1.0, bootstrap=True,
bootstrap_features=False,n_jobs=-1,random_state=42)
>>> bag_fit.fit(x_train, y_train)
>>> print (" Ensemble of AdaBoost - Train Confusion Matrix ",pd.crosstab( y_train,bag_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"]))
>>> print (" Ensemble of AdaBoost - Train accuracy",round(accuracy_score(y_train, bag_fit.predict(x_train)),3))
>>> print (" Ensemble of AdaBoost - Train Classification Report ", classification_report( y_train,bag_fit.predict(x_train)))

>>> print (" Ensemble of AdaBoost - Test Confusion Matrix ",pd.crosstab(y_test, bag_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"]))
>>> print (" Ensemble of AdaBoost - Test accuracy",round(accuracy_score(y_test,bag_fit.predict(x_test)),3))
>>> print (" Ensemble of AdaBoost - Test Classification Report ", classification_report(y_test,bag_fit.predict(x_test)))

The results of the ensemble on AdaBoost have shown some improvements, in which the test accuracy obtained is 87.1%, which is almost to that of gradient boosting at 87.5%, which is the best value we have seen so far. However, the number of 1's identified is 25 here, which is greater than Gradient Boosting. Hence, it has been proven that an ensemble of ensembles does work! Unfortunately, these types of functions are not available in R software, hence we are not writing the equivalent R-code here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset