K-means clustering using Python

To recap from Chapter 4Machine Learning Workouts on IBM Cloudk-means clustering is an unsupervised machine learning methodology—an algorithm that is commonly used to find groups within unlabeled data. Again, since the goal here is to demonstrate how you can apply this methodology to some data using Python in Watson Studio, we won't bother to dissect the details of how k-means works, but will show a working example of the algorithm, using Watson Studio as a proof of concept.

There are numerous examples available online and elsewhere demonstrating the use of Python to implement k-means logic. Here, we'll use an example that is simple to follow and uses available Python modules, such as matplotlib, pandas, and scipy.

Our exercise, using IBM Watson Studio and the Notebook (we created in the sections of this chapter) will:

  1. Create a DataFrame for a two-dimensional dataset
  2. Find centroids for three clusters, and then for four clusters
  3. Add a graphical user interface (GUI) to display the results

The most representative point within a group is called the centroid. It is defined as the mean of the values of the points of data in the cluster. Each cluster should consist of the points of data closest to it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset