Tuning of k-value in KNN classifier

In the previous section, we just checked with only the k-value of three. Actually, in any machine learning algorithm, we need to tune the knobs to check where the better performance can be obtained. In the case of KNN, the only tuning parameter is k-value. Hence, in the following code, we are determining the best k-value with grid search:

# Tuning of K- value for Train & Test data 
>>> dummyarray = np.empty((5,3)) 
>>> k_valchart = pd.DataFrame(dummyarray) 
>>> k_valchart.columns = ["K_value","Train_acc","Test_acc"] 
 
>>> k_vals = [1,2,3,4,5] 
 
>>> for i in range(len(k_vals)): 
...     knn_fit = KNeighborsClassifier(n_neighbors=k_vals[i],p=2,metric='minkowski') 
...     knn_fit.fit(x_train,y_train) 
 
...     print ("
K-value",k_vals[i]) 
     
...     tr_accscore = round(accuracy_score(y_train,knn_fit.predict(x_train)),3) 
...     print ("
K-Nearest Neighbors - Train Confusion Matrix

",pd.crosstab( y_train, knn_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"]) )      
...     print ("
K-Nearest Neighbors - Train accuracy:",tr_accscore) 
...     print ("
K-Nearest Neighbors - Train Classification Report
", classification_report(y_train,knn_fit.predict(x_train))) 
 
...     ts_accscore = round(accuracy_score(y_test,knn_fit.predict(x_test)),3)     
...     print ("

K-Nearest Neighbors - Test Confusion Matrix

",pd.crosstab( y_test,knn_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"]))       
...     print ("
K-Nearest Neighbors - Test accuracy:",ts_accscore) 
...     print ("
K-Nearest Neighbors - Test Classification Report
",classification_report(y_test,knn_fit.predict(x_test))) 
     
...     k_valchart.loc[i, 'K_value'] = k_vals[i]       
...     k_valchart.loc[i, 'Train_acc'] = tr_accscore      
...     k_valchart.loc[i, 'Test_acc'] = ts_accscore                
 
# Ploting accuracies over varied K-values 
>>> import matplotlib.pyplot as plt 
>>> plt.figure() 
>>> plt.xlabel('K-value') 
>>> plt.ylabel('Accuracy') 
>>> plt.plot(k_valchart["K_value"],k_valchart["Train_acc"]) 
>>> plt.plot(k_valchart["K_value"],k_valchart["Test_acc"]) 
 
>>> plt.axis([0.9,5, 0.92, 1.005]) 
>>> plt.xticks([1,2,3,4,5]) 
 
>>> for a,b in zip(k_valchart["K_value"],k_valchart["Train_acc"]): 
...     plt.text(a, b, str(b),fontsize=10) 
 
>>> for a,b in zip(k_valchart["K_value"],k_valchart["Test_acc"]): 
...     plt.text(a, b, str(b),fontsize=10) 
     
>>> plt.legend(loc='upper right')     
>>> plt.show() 

It appears that with less value of k-value, it has more overfitting problems due to the very high value of accuracy on train data and less on test data, with the increase in k-value more the train and test accuracies are converging and becoming more robust. This phenomenon illustrates the typical machine learning phenomenon. As for further analysis, readers are encouraged to try k-values higher than five and see how train and test accuracies are changing. The R code for tuning of k-value in KNN classifier is as follows:

# Tuning of K-value on Train & Test Data 
k_valchart = data.frame(matrix( nrow=5, ncol=3))
colnames(k_valchart) = c("K_value","Train_acc","Test_acc")
k_vals = c(1,2,3,4,5)

i = 1
for (kv in k_vals) {
tr_y_pred = knn(train_data,train_data,train_data$Cancer_Ind,k=kv)
ts_y_pred = knn(train_data,test_data,train_data$Cancer_Ind,k=kv)
tr_y_act = train_data$Cancer_Ind;ts_y_act = test_data$Cancer_Ind
tr_tble = table(tr_y_act,tr_y_pred)
print(paste("Train Confusion Matrix"))
print(tr_tble)
tr_acc = accrcy(tr_y_act,tr_y_pred)
trprec_zero = prec_zero(tr_y_act,tr_y_pred); trrecl_zero = recl_zero(tr_y_act, tr_y_pred)
trprec_one = prec_one(tr_y_act,tr_y_pred); trrecl_one = recl_one(tr_y_act,tr_y_pred)
trprec_ovll = trprec_zero *frac_trzero + trprec_one*frac_trone
trrecl_ovll = trrecl_zero *frac_trzero + trrecl_one*frac_trone
print(paste("KNN Train accuracy:",tr_acc))
print(paste("KNN - Train Classification Report"))

print(paste("Zero_Precision",trprec_zero,"Zero_Recall",trrecl_zero))
print(paste("One_Precision",trprec_one,"One_Recall",trrecl_one))
print(paste("Overall_Precision",round(trprec_ovll,4),"Overall_Recall",round(trrecl_ovll,4)))
ts_tble = table(ts_y_act,ts_y_pred)
print(paste("Test Confusion Matrix"))
print(ts_tble)
ts_acc = accrcy(ts_y_act,ts_y_pred)
tsprec_zero = prec_zero(ts_y_act,ts_y_pred); tsrecl_zero = recl_zero(ts_y_act,ts_y_pred)
tsprec_one = prec_one(ts_y_act,ts_y_pred); tsrecl_one = recl_one(ts_y_act,ts_y_pred)
tsprec_ovll = tsprec_zero *frac_tszero + tsprec_one*frac_tsone
tsrecl_ovll = tsrecl_zero *frac_tszero + tsrecl_one*frac_tsone

print(paste("KNN Test accuracy:",ts_acc))
print(paste("KNN - Test Classification Report"))

print(paste("Zero_Precision",tsprec_zero,"Zero_Recall",tsrecl_zero))
print(paste("One_Precision",tsprec_one,"One_Recall",tsrecl_one))
print(paste("Overall_Precision",round(tsprec_ovll,4),"Overall_Recall",round(tsrecl_ovll,4)))

k_valchart[i,1] =kv
k_valchart[i,2] =tr_acc
k_valchart[i,3] =ts_acc i = i+1 }
# Plotting the graph
library(ggplot2)
library(grid)
ggplot(k_valchart, aes(K_value))
+ geom_line(aes(y = Train_acc, colour = "Train_Acc")) +
geom_line(aes(y = Test_acc, colour = "Test_Acc"))+
labs(x="K_value",y="Accuracy") +
geom_text(aes(label = Train_acc, y = Train_acc), size = 3)+
geom_text(aes(label = Test_acc, y = Test_acc), size = 3)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset