Decision tree classifier

The DecisionTtreeClassifier from scikit-learn has been utilized for modeling purposes, which is available in the tree submodule:

# Decision Tree Classifier 
>>> from sklearn.tree import DecisionTreeClassifier 

The parameters selected for the DT classifier are in the following code with splitting criterion as Gini, Maximum depth as 5, the minimum number of observations required for qualifying split is 2, and the minimum samples that should be present in the terminal node is 1:

 >>> dt_fit = DecisionTreeClassifier(criterion="gini", max_depth=5,min_samples_split=2,  min_samples_leaf=1,random_state=42) 
>>> dt_fit.fit(x_train,y_train) 
 
>>> print ("
Decision Tree - Train Confusion  Matrix

", pd.crosstab(y_train, dt_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"]))    
>>> from sklearn.metrics import accuracy_score, classification_report    
>>> print ("
Decision Tree - Train accuracy

",round(accuracy_score (y_train, dt_fit.predict(x_train)),3)) 
>>> print ("
Decision Tree - Train Classification Report
", classification_report(y_train, dt_fit.predict(x_train))) 
 
>>> print ("

Decision Tree - Test Confusion Matrix

",pd.crosstab(y_test, dt_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"])) 
>>> print ("
Decision Tree - Test accuracy",round(accuracy_score(y_test, dt_fit.predict(x_test)),3)) 
>>> print ("
Decision Tree - Test Classification Report
", classification_report( y_test, dt_fit.predict(x_test))) 

By carefully observing the results, we can infer that, even though the test accuracy is high (84.6%), the precision and recall of one category (Attrition = Yes) is low (precision = 0.39 and recall = 0.20). This could be a serious issue when management tries to use this model to provide some extra benefits proactively to the employees with a high chance of attrition prior to actual attrition, as this model is unable to identify the real employees who will be leaving. Hence, we need to look for other modifications; one way is to control the model by using class weights. By utilizing class weights, we can increase the importance of a particular class at the cost of an increase in other errors.

For example, by increasing class weight to category 1, we can identify more employees with the characteristics of actual attrition, but by doing so, we will mark some of the non-potential churner employees as potential attriters (which should be acceptable).

Another classic example of the important use of class weights is, in banking scenarios. When giving loans, it is better to reject some good applications than accepting bad loans. Hence, even in this case, it is a better idea to use higher weight to defaulters over non-defaulters:

R Code for Decision Tree Classifier Applied on HR Attrition Data:

# Decision Trees using C5.0   package   
library(C50)   
dtree_fit = C5.0(train_data[-31],train_data$Attrition_ind,costs   = NULL,control = C5.0Control(minCases = 1))   
summary(dtree_fit)   
tr_y_pred = predict(dtree_fit,   train_data,type = "class")   
ts_y_pred =   predict(dtree_fit,test_data,type = "class")   
tr_y_act =   train_data$Attrition_ind;ts_y_act = test_data$Attrition_ind   
   
tr_tble =   table(tr_y_act,tr_y_pred)   
print(paste("Train   Confusion Matrix"))   
print(tr_tble)   
tr_acc =   accrcy(tr_y_act,tr_y_pred)   
trprec_zero =   prec_zero(tr_y_act,tr_y_pred);    
trrecl_zero =   recl_zero(tr_y_act,tr_y_pred)   
trprec_one =   prec_one(tr_y_act,tr_y_pred);    
trrecl_one =   recl_one(tr_y_act,tr_y_pred)   
trprec_ovll = trprec_zero *frac_trzero   + trprec_one*frac_trone   
trrecl_ovll = trrecl_zero   *frac_trzero + trrecl_one*frac_trone   
   
print(paste("Decision Tree   Train accuracy:",tr_acc))   
print(paste("Decision Tree   - Train Classification Report"))   
print(paste("Zero_Precision",trprec_zero,"Zero_Recall",trrecl_zero))   
print(paste("One_Precision",trprec_one,"One_Recall",trrecl_one))   
print(paste("Overall_Precision",round(trprec_ovll,4),"Overall_Recall",   
round(trrecl_ovll,4)))   
   
ts_tble =   table(ts_y_act,ts_y_pred)   
print(paste("Test   Confusion Matrix"))   
print(ts_tble)   
   
ts_acc =   accrcy(ts_y_act,ts_y_pred)   
tsprec_zero =   prec_zero(ts_y_act,ts_y_pred); tsrecl_zero = recl_zero(ts_y_act,ts_y_pred)   
tsprec_one =   prec_one(ts_y_act,ts_y_pred); tsrecl_one = recl_one(ts_y_act,ts_y_pred)   
   
tsprec_ovll = tsprec_zero *frac_tszero   + tsprec_one*frac_tsone   
tsrecl_ovll = tsrecl_zero   *frac_tszero + tsrecl_one*frac_tsone   
   
print(paste("Decision Tree   Test accuracy:",ts_acc))   
print(paste("Decision Tree   - Test Classification Report"))   
print(paste("Zero_Precision",tsprec_zero,"Zero_Recall",tsrecl_zero))   
print(paste("One_Precision",tsprec_one,"One_Recall",tsrecl_one))   
print(paste("Overall_Precision",round(tsprec_ovll,4),   
"Overall_Recall",round(tsrecl_ovll,4)))   
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset