Gower and PAM

To begin this step, we will need to wrangle our data a little bit. As this method can take variables that are factors, we will convert alcohol to either high or low content. It also takes only one line of code utilizing the ifelse() function to change the variable to a factor. What this will accomplish is if alcohol is greater than zero, it will be High, otherwise, it will be Low:

> wine_df$Alcohol <- as.factor(ifelse(df$Alcohol > 0, "High", "Low"))

We are now ready to create the dissimilarity matrix using the daisy() function from the cluster package and specifying the method as gower:

> gower_dist <- cluster::daisy(wine[, -1], metric = "gower")

The creation of the cluster object is done with the pam() function, which is a part of the cluster package. We will create three clusters in this example and create a table of the cluster size:

> set.seed(123)

> pam_cluster <- cluster::pam(gower_dist, k = 3)

> table(pam_cluster$clustering)

1 2 3
62 71 45

Now, let's see how it does compared to the cultivar labels:

> table(pam_cluster$clustering, wine$Class)

1 2 3
1 57 5 0
2 2 64 5
3 0 2 43

You can run a similar aggregation and exploration exercise with this method as described previously. Let's see how the distribution of alcohol is across the three clusters:

> table(pam_cluster$clustering, wine$Alcohol)

High Low
1 62 0
2 1 70
3 29 16

This table shows the proportion of the factor levels by the cluster. The Gower metric is very powerful for data with labels, factors, characters, missing values, and so on. I highly recommend it. One of the drawbacks with any distance matrix is that it can become a computational problem with large datasets. An effective solution is to run k-samples and compare results. Done well, you can then build a classifier to predict the cluster for your population.

Finally, we'll create a dissimilarity matrix with random forest and create three clusters with PAM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset