Time to Put Some Order - Cluster Your Data with Spark MLlib

"If you take a galaxy and try to make it bigger, it becomes a cluster of galaxies, not a galaxy. If you try to make it smaller than that, it seems to blow itself apart"

- Jeremiah P. Ostriker

In this chapter, we will delve deeper into machine learning and find out how we can take advantage of it to cluster records belonging to a certain group or class for a dataset of unsupervised observations. In a nutshell, the following topics will be covered in this chapter:

  • Unsupervised learning
  • Clustering techniques
  • Hierarchical clustering (HC)
  • Centroid-based clustering (CC)
  • Distribution-based clustering (DC)
  • Determining number of clusters
  • A comparative analysis between clustering algorithms
  • Submitting jobs on computing clusters
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset