Summary

Clustering of the data is very efficient and can be used to facilitate a faster classification of the new features by classifying a feature to the class represented in the cluster of that feature. An appropriate number of the clusters can be determined by cross-validation choosing the one that results in the most accurate classification.

Clustering orders data by their similarity. The more clusters, the greater similarity between the features in a cluster, but a fewer features in a cluster.

The k-means clustering algorithm is a clustering algorithm that tries to cluster features in such a way that the mutual distance of the features in a cluster is minimized. To do this, the algorithm computes centroid of each cluster and a feature belongs to the cluster whose centroid is closest to it. The algorithm finishes the computation of the clusters as soon as they or their centroids no longer change.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset