The elbow method

The elbow method is used to determine the optimal number of clusters in k-means clustering. The elbow method plots the value of the cost function produced by different values of k. As you know, if k increases, average distortion will decrease, each cluster will have fewer constituent instances, and the instances will be closer to their respective centroids. However, the improvements in average distortion will decline as k increases. The value of k at which improvement in distortion declines the most is called the elbow, at which we should stop dividing the data into further clusters.

Evaluation of clusters with silhouette coefficient: the silhouette coefficient is a measure of the compactness and separation of the clusters. Higher values represent a better quality of cluster. The silhouette coefficient is higher for compact clusters that are well separated and lower for overlapping clusters. Silhouette coefficient values do change from -1 to +1, and the higher the value is, the better.

The silhouette coefficient is calculated per instance. For a set of instances, it is calculated as the mean of the individual sample's scores.

a is the mean distance between the instances in the cluster, b is the mean distance between the instance and the instances in the next closest cluster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset