Household incomes - clustering into k clusters

For example let us take households with the yearly earnings in USD dollars 40k, 55k, 70k, 100k, 115k, 130k, 135k. Then if we require to cluster the households into the two clusters taking their earnings as a measure of similarity, then the first cluster would have the households earning 40k, 55k, 70k; the second cluster would have the households earning 100k, 115k, 130k, 135k.

This is because 40k and 135k are furthest away from each other, and we require to have two clusters, so they have to be in the different clusters. 55K is closer to 40k than to 135k, so 40k and 55k will be in the same cluster. Similarly, 130k and 135k will be in the same cluster. 70K is closer to 40k and 55k than to 130k and 135k, so 70k should be in the cluster with 40k and 55k. 115K is closer to 130k and 135k than to the first cluster with 40k, 55k and 70k, so it will be in the second cluster. Finally, 100k is closer to the second cluster with 115k, 130k and 135k, so it will be there. Therefore the first cluster will contain 40k, 55k and 70k households. The second cluster will contain 100k, 115k, 130k and 135k households.

Clustering groups features with similar properties and assigning a cluster to a feature is a form of classification. It is up to a data scientist to interpret the result of the clustering and what classification it induces. Here the cluster with the households with the annual incomes 40k, 55k, 70k USD represents a class of households with a low income. The second cluster with the households with the annual incomes 100k, 115k, 130k and 135k represents a class of households with a high income.

We clustered the households into the two clusters in an informal way based on the intuition and the common sense. There are clustering algorithms that cluster the data according to the precise rules. The algorithms include fuzzy c-means clustering algorithm, hierarchical clustering algorithm, Gaussian(EM) clustering algorithm, Quality Threshold clustering algorithm and k-means clustering algorithm which is the focus of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset