Identifying Groups of Data Using Clustering Methods

Clustering methods are designed to find hidden patterns or groupings in a dataset. Unlike the supervised learning methods covered in previous chapters, these algorithms identify a grouping without any label to learn from through the selection of clusters based on similarities between elements.

This is an unsupervised learning technique that groups statistical units to minimize the intragroup distance and maximize the intergroup distance. The distance between the groups is quantified by means of similarity/dissimilarity measures defined between the statistical units. 

To perform cluster analysis, no prior interpretative model is required. In fact, unlike other multivariate statistical techniques, this one does not make an apriori assumption on the existing fundamental typologies that may characterize the observed sample. This, however, occurs in the case of discriminating analysis, which makes it possible to split a set of individuals into groups predetermined from the beginning according to the different modes assumed by one or more characters. The cluster analysis technique has an exploratory role to look for existing but not-yet-identified structures in order to deduce the most likely group. This analysis is in fact a purely empirical method of classification, and as such, in the first place, it is an inductive technique.

This chapter shows you how to divide data into clusters, or groupings of similar items. You'll learn how to finding groups of data with K-means and K-medoids methods. We'll also cover grouping techniques using hierarchical clustering.

We will cover the following topics:

  • Hierarchical clustering
  • K-means method
  • K-medoids method
  • Gaussian mixture models
  • Dendrograms

At the end of the chapter, we will be able to perform different types of clustering techniques. We will learn how to apply clustering methods to our data, and how the clustering algorithm works. We will understand the basic concepts that clustering methods use to group data using similarity measures. We will discover how to prepare data for clustering analysis. We'll also learn topics such as K-means and K-medoids techniques, cluster trees and dendrograms, and finally Gaussian mixture models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset