Clustering of the data is very efficient and can be used to facilitate a faster classification of the new features by classifying a feature to the class represented in the cluster of that feature. An appropriate number of the clusters can be determined by cross-validation choosing the one that results in the most accurate classification.
Clustering orders data by their similarity. The more clusters, the greater similarity between the features in a cluster, but a fewer features in a cluster.
The k-means clustering algorithm is a clustering algorithm that tries to cluster features in such a way that the mutual distance of the features in a cluster is minimized. To do this, the algorithm computes centroid of each cluster and a feature belongs to the cluster whose centroid is closest to it. The algorithm finishes the computation of the clusters as soon as they or their centroids no longer change.