Interpreting your results

Sometimes groupings in data make immediate sense. When clustering by income and age, one could come across a group that can be labeled as young professionals.

In UN development indicators dataset, using the Describe dialog, one can clearly see that Cluster 1, Cluster 2, and Cluster 3 correspond to Underdeveloped, Developing, and Highly Developed countries, respectively. By doing so we're using k-means to compress the information that is contained in three columns and 180+ rows to just three labels. Clustering can sometimes also find patterns your dataset may not be able to sufficiently explain by itself.

For example, as you're clustering health records, you may find two distinct groups and why? is not immediately clear and describable with the available data, which may lead you to ask more questions and maybe later realize that difference was because one group exercised regularly while the other didn't, or one had an immunity to a certain disease. It may even indicate things such as fraudulent activity/drug abuse, which otherwise you may not have noticed. Given it is hard to anticipate and collect all relevant data, such hidden patterns are not uncommon in real life.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset