K-means clustering using Python

To recap from Chapter 4, Machine Learning Workouts on IBM Cloud, k-means clustering is an unsupervised machine learning methodology—an algorithm that is commonly used to find groups within unlabeled data. Again, since the goal here is to demonstrate how you can apply this methodology to some data using Python in Watson Studio, we won't bother to dissect the details of how k-means works, but will show a working example of the algorithm, using Watson Studio as a proof of concept.

There are numerous examples available online and elsewhere demonstrating the use of Python to implement k-means logic. Here, we'll use an example that is simple to follow and uses available Python modules, such as matplotlib, pandas, and scipy.

Our exercise, using IBM Watson Studio and the Notebook (we created in the sections of this chapter) will:

Create a DataFrame for a two-dimensional dataset
Find centroids for three clusters, and then for four clusters
Add a graphical user interface (GUI) to display the results

The most representative point within a group is called the centroid. It is defined as the mean of the values of the points of data in the cluster. Each cluster should consist of the points of data closest to it.

Table of Contents for K-means clustering using Python

Create new playlist

Sign In

Sign Up

Table of Contents for
K-means clustering using Python