Tuning of k-value in KNN classifier

In the previous section, we just checked with only the k-value of three. Actually, in any machine learning algorithm, we need to tune the knobs to check where the better performance can be obtained. In the case of KNN, the only tuning parameter is k-value. Hence, in the following code, we are determining the best k-value with grid search:

# Tuning of K- value for Train & Test data 
>>> dummyarray = np.empty((5,3)) 
>>> k_valchart = pd.DataFrame(dummyarray) 
>>> k_valchart.columns = ["K_value","Train_acc","Test_acc"] 
 
>>> k_vals = [1,2,3,4,5] 
 
>>> for i in range(len(k_vals)): 
...     knn_fit = KNeighborsClassifier(n_neighbors=k_vals[i],p=2,metric='minkowski') 
...     knn_fit.fit(x_train,y_train) 
 
...     print ("
K-value",k_vals[i]) 
     
...     tr_accscore = round(accuracy_score(y_train,knn_fit.predict(x_train)),3) 
...     print ("
K-Nearest Neighbors - Train Confusion Matrix

",pd.crosstab( y_train, knn_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"]) )      
...     print ("
K-Nearest Neighbors - Train accuracy:",tr_accscore) 
...     print ("
K-Nearest Neighbors - Train Classification Report
", classification_report(y_train,knn_fit.predict(x_train))) 
 
...     ts_accscore = round(accuracy_score(y_test,knn_fit.predict(x_test)),3)     
...     print ("

K-Nearest Neighbors - Test Confusion Matrix

",pd.crosstab( y_test,knn_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"]))       
...     print ("
K-Nearest Neighbors - Test accuracy:",ts_accscore) 
...     print ("
K-Nearest Neighbors - Test Classification Report
",classification_report(y_test,knn_fit.predict(x_test))) 
     
...     k_valchart.loc[i, 'K_value'] = k_vals[i]       
...     k_valchart.loc[i, 'Train_acc'] = tr_accscore      
...     k_valchart.loc[i, 'Test_acc'] = ts_accscore                
 
# Ploting accuracies over varied K-values 
>>> import matplotlib.pyplot as plt 
>>> plt.figure() 
>>> plt.xlabel('K-value') 
>>> plt.ylabel('Accuracy') 
>>> plt.plot(k_valchart["K_value"],k_valchart["Train_acc"]) 
>>> plt.plot(k_valchart["K_value"],k_valchart["Test_acc"]) 
 
>>> plt.axis([0.9,5, 0.92, 1.005]) 
>>> plt.xticks([1,2,3,4,5]) 
 
>>> for a,b in zip(k_valchart["K_value"],k_valchart["Train_acc"]): 
...     plt.text(a, b, str(b),fontsize=10) 
 
>>> for a,b in zip(k_valchart["K_value"],k_valchart["Test_acc"]): 
...     plt.text(a, b, str(b),fontsize=10) 
     
>>> plt.legend(loc='upper right')     
>>> plt.show()

It appears that with less value of k-value, it has more overfitting problems due to the very high value of accuracy on train data and less on test data, with the increase in k-value more the train and test accuracies are converging and becoming more robust. This phenomenon illustrates the typical machine learning phenomenon. As for further analysis, readers are encouraged to try k-values higher than five and see how train and test accuracies are changing. The R code for tuning of k-value in KNN classifier is as follows:

# Tuning of K-value on Train & Test Data 
k_valchart = data.frame(matrix( nrow=5, ncol=3)) 
colnames(k_valchart) = c("K_value","Train_acc","Test_acc") 
k_vals = c(1,2,3,4,5) 

i = 1
for (kv in k_vals) { 
  tr_y_pred = knn(train_data,train_data,train_data$Cancer_Ind,k=kv)
  ts_y_pred = knn(train_data,test_data,train_data$Cancer_Ind,k=kv)
  tr_y_act = train_data$Cancer_Ind;ts_y_act = test_data$Cancer_Ind
  tr_tble = table(tr_y_act,tr_y_pred) 
  print(paste("Train Confusion Matrix")) 
  print(tr_tble) 
  tr_acc = accrcy(tr_y_act,tr_y_pred) 
  trprec_zero = prec_zero(tr_y_act,tr_y_pred); trrecl_zero = recl_zero(tr_y_act, tr_y_pred) 
  trprec_one = prec_one(tr_y_act,tr_y_pred); trrecl_one = recl_one(tr_y_act,tr_y_pred) 
  trprec_ovll = trprec_zero *frac_trzero + trprec_one*frac_trone
  trrecl_ovll = trrecl_zero *frac_trzero + trrecl_one*frac_trone
  print(paste("KNN Train accuracy:",tr_acc)) 
  print(paste("KNN - Train Classification Report"))

print(paste("Zero_Precision",trprec_zero,"Zero_Recall",trrecl_zero))
print(paste("One_Precision",trprec_one,"One_Recall",trrecl_one))
print(paste("Overall_Precision",round(trprec_ovll,4),"Overall_Recall",round(trrecl_ovll,4))) 
  ts_tble = table(ts_y_act,ts_y_pred) 
  print(paste("Test Confusion Matrix")) 
  print(ts_tble)
  ts_acc = accrcy(ts_y_act,ts_y_pred) 
  tsprec_zero = prec_zero(ts_y_act,ts_y_pred); tsrecl_zero = recl_zero(ts_y_act,ts_y_pred) 
  tsprec_one = prec_one(ts_y_act,ts_y_pred); tsrecl_one = recl_one(ts_y_act,ts_y_pred) 
  tsprec_ovll = tsprec_zero *frac_tszero + tsprec_one*frac_tsone
  tsrecl_ovll = tsrecl_zero *frac_tszero + tsrecl_one*frac_tsone

  print(paste("KNN Test accuracy:",ts_acc)) 
  print(paste("KNN - Test Classification Report"))

print(paste("Zero_Precision",tsprec_zero,"Zero_Recall",tsrecl_zero))
print(paste("One_Precision",tsprec_one,"One_Recall",tsrecl_one))
print(paste("Overall_Precision",round(tsprec_ovll,4),"Overall_Recall",round(tsrecl_ovll,4)))

  k_valchart[i,1] =kv 
  k_valchart[i,2] =tr_acc 
  k_valchart[i,3] =ts_acc i = i+1 } 
# Plotting the graph 
library(ggplot2) 
library(grid) 
ggplot(k_valchart, aes(K_value)) 
+ geom_line(aes(y = Train_acc, colour = "Train_Acc")) + 
geom_line(aes(y = Test_acc, colour = "Test_Acc"))+
labs(x="K_value",y="Accuracy") + 
geom_text(aes(label = Train_acc, y = Train_acc), size = 3)+
geom_text(aes(label = Test_acc, y = Test_acc), size = 3)

Table of Contents for Tuning of k-value in KNN classifier

Create new playlist

Sign In

Sign Up

Table of Contents for
Tuning of k-value in KNN classifier