Tuning class weights in decision tree classifier

In the following code, class weights are tuned to see the performance change in decision trees with the same parameters. A dummy DataFrame is created to save all the results of various precision-recall details of combinations:

>>> dummyarray = np.empty((6,10))
>>> dt_wttune = pd.DataFrame(dummyarray)

Metrics to be considered for capture are weight for zero and one category (for example, if the weight for zero category given is 0.2, then automatically, weight for the one should be 0.8, as total weight should be equal to 1), training and testing accuracy, precision for zero category, one category, and overall. Similarly, recall for zero category, one category, and overall are also calculated:

>>> dt_wttune.columns = ["zero_wght","one_wght","tr_accuracy", "tst_accuracy", "prec_zero","prec_one", "prec_ovll", "recl_zero","recl_one","recl_ovll"] 

Weights for the zero category are verified from 0.01 to 0.5, as we know we do not want to explore cases where the zero category will be given higher weightage than one category:

>>> zero_clwghts = [0.01,0.1,0.2,0.3,0.4,0.5] 
 
>>> for i in range(len(zero_clwghts)): 
...    clwght = {0:zero_clwghts[i],1:1.0-zero_clwghts[i]} 
...    dt_fit = DecisionTreeClassifier(criterion="gini",  max_depth=5,               ... min_samples_split=2, min_samples_leaf=1,random_state=42,class_weight = clwght) 
...    dt_fit.fit(x_train,y_train) 
...    dt_wttune.loc[i, 'zero_wght'] = clwght[0]        
...    dt_wttune.loc[i, 'one_wght'] = clwght[1]      
...    dt_wttune.loc[i, 'tr_accuracy'] = round(accuracy_score(y_train, dt_fit.predict( x_train)),3)     
...    dt_wttune.loc[i, 'tst_accuracy'] = round(accuracy_score(y_test,dt_fit.predict( x_test)),3)     
         
...    clf_sp = classification_report(y_test,dt_fit.predict(x_test)).split() 
...    dt_wttune.loc[i, 'prec_zero'] = float(clf_sp[5])    
...    dt_wttune.loc[i, 'prec_one'] = float(clf_sp[10])    
...    dt_wttune.loc[i, 'prec_ovll'] = float(clf_sp[17])    
     
...    dt_wttune.loc[i, 'recl_zero'] = float(clf_sp[6])    
...    dt_wttune.loc[i, 'recl_one'] = float(clf_sp[11])    
...    dt_wttune.loc[i, 'recl_ovll'] = float(clf_sp[18]) 
...    print ("
Class Weights",clwght,"Train accuracy:",round(accuracy_score( y_train,dt_fit.predict(x_train)),3),"Test accuracy:",round(accuracy_score(y_test, dt_fit.predict(x_test)),3)) 
...    print ("Test Confusion Matrix

",pd.crosstab(y_test,dt_fit.predict( x_test),rownames = ["Actuall"],colnames = ["Predicted"]))   

From the preceding screenshot, we can see that at class weight values of 0.3 (for zero) and 0.7 (for one) it is identifying a higher number of attriters (25 out of 61) without compromising test accuracy 83.9% using decision trees methodology:

R Code for Decision Tree Classifier with class weights Applied on HR Attrition Data:

#Decision Trees using C5.0   package - Error Costs   
library(C50)   
class_zero_wgt =   c(0.01,0.1,0.2,0.3,0.4,0.5)   
   
for (cwt in class_zero_wgt){   
  cwtz = cwt   
  cwto = 1-cwtz   
  cstvr = cwto/cwtz     
  error_cost <- matrix(c(0,   1, cstvr, 0), nrow = 2)     
  dtree_fit = C5.0(train_data[-31],train_data$Attrition_ind, 
costs = error_cost,control = C5.0Control( minCases = 1)) summary(dtree_fit) tr_y_pred = predict(dtree_fit, train_data,type = "class") ts_y_pred = predict(dtree_fit,test_data,type = "class") tr_y_act = train_data$Attrition_ind; ts_y_act = test_data$Attrition_ind tr_acc = accrcy(tr_y_act,tr_y_pred) ts_acc = accrcy(ts_y_act,ts_y_pred) print(paste("Class weights","{0:",cwtz,"1:",cwto,"}", "Decision Tree Train accuracy:",tr_acc, "Decision Tree Test accuracy:",ts_acc)) ts_tble = table(ts_y_act,ts_y_pred) print(paste("Test Confusion Matrix")) print(ts_tble) }
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset