The Python code

First, we can take a look at the Python code:

  1. This step references pandas and then defines our two-dimensional DataFrame. Note that the data is simply two lists of numbers, defined as x and y:
from pandas import DataFrame
Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}
df = DataFrame(Data,columns=['x','y'])
print (df)

The last command (print(df)) is added so that if you run the code, you'll get to see the output, which should match the dataset that was defined.

  1. The next step is where we will use the sklearn Python module to find the centroids for three and then for four clusters, and the matplotlib module to create some charts to visualize the results of the algorithm.

Scikit-learn provides a range of supervised as well as unsupervised learning algorithms through a consistent interface in Python. The library is built upon the SciPy (Scientific Python). Matplotlib is a plotting library for Python and its numerical mathematics extension NumPy (Wikipedia, 2019).

  1. Once the DataFrame is created using the columns of data entered, the next block of Python code also imports the two aforementioned Python modules and specifies the number of clusters to create with the KMeans algorithm and finally uses matplotlib to generate some scatter plots:
from pandas import DataFrame
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
Data = {'x': [25,34,22,27,33,33,31,22,35,34,67,54,57,43,50,57,59,52,65,47,49,48,35,33,44,45,38,43,51,46],
'y': [79,51,53,78,59,74,73,57,69,75,51,32,40,47,53,36,35,58,59,50,25,20,14,12,20,5,29,27,8,7]
}
df = DataFrame(Data,columns=['x','y'])
kmeans = KMeans(n_clusters=3).fit(df)
centroids = kmeans.cluster_centers_
print(centroids)
plt.scatter(df['x'], df['y'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
plt.show()

The output from the preceding Python code generates the following output: 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset