PAM

For Partitioning Around Medoids, let's first define a medoid.

A medoid is an observation of a cluster that minimizes the dissimilarity (in our case, calculated using the Gower metric) between the other observations in that cluster. So, similar to k-means, if you specify five clusters, you will have five partitions of the data.

With the objective of minimizing the dissimilarity of all the observations to the nearest medoid, the PAM algorithm iterates over the following steps:

  1. Randomly select k observations as the initial medoid.
  2. Assign each observation to the closest medoid.
  3. Swap each medoid and non-medoid observation, computing the dissimilarity cost.
  4. Select the configuration that minimizes the total dissimilarity.
  5. Repeat steps 2 through 4 until there is no change in the medoids.

Both Gower and PAM can be called using the cluster package in R. For Gower, we will use the daisy() function in order to calculate the dissimilarity matrix and the pam() function for the actual partitioning. With this, let's get started with putting these methods to the test.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset