Random forest classifier method

Another method of modeling that tells us which features have an impact on our model is the feature importance that comes out of a random forest classifier. This more accurately reflects the true impact of a given feature.

Let's run our data through this type of model and examine the results:

from sklearn.ensemble import RandomForestClassifier 
clf_rf = RandomForestClassifier(n_estimators=1000) 
clf_rf.fit(X_train, y_train) 
 
f_importances = clf_rf.feature_importances_ 
 
f_names = X_train.columns 
f_std = np.std([tree.feature_importances_ for tree in clf_rf.estimators_], axis=0) 
 
zz = zip(f_importances, f_names, f_std) 
zzs = sorted(zz, key=lambda x: x[0], reverse=True) 
 
n_features = 10 
imps = [x[0] for x in zzs[:n_features]] 
labels = [x[1] for x in zzs[:n_features]] 
errs = [x[2] for x in zzs[:n_features]] 
 
fig, ax = plt.subplots(figsize=(16, 8)) 
ax.bar(range(n_features), imps, color="r", yerr=errs) 
plt.xticks(range(n_features), labels) 
plt.setp( ax.xaxis.get_majorticklabels(), rotation=-70, ha="left" );

The preceding code generates the following output:

In the preceding code, we ran a random forest classifier, extracted and sorted the importance of the features, and then graphed those values with their error bars.

From this data, we see that what has the most impact on the model is the opening gap, the offer price, and the number of managers involved in the deal. These would all seem to make sense as having predictive values, since they indicate that there is strong demand for the deal.

Table of Contents for Random forest classifier method

Create new playlist

Sign In

Sign Up

Table of Contents for
Random forest classifier method