Part 2. Clustering

This part of the book, including chapters 7 through 12, explores clustering algorithms in Apache Mahout. With the techniques described here, you can group together similar-looking pieces of data into a set or a cluster. Clustering helps uncover interesting groups of information in a large volume of data. This part of the book begins with simple problems in clustering involving examples written in Java. As we progress, you’ll see more real-world examples and learn how to make Apache Mahout run as Hadoop jobs that can cluster large data easily.

Chapter 7 introduces the notion of clustering and explains it with an example of clustering points in a 2-dimensional plane. Chapter 8 introduces the concept of vectors and explains how data can be represented using them. Chapter 9 introduces various clustering algorithms implemented in Apache Mahout. With clear examples, this chapter shows you how each clustering algorithm fits various use cases.

Chapter 10 shows how clustering quality can be measured and how it can be improved by tweaking various options and parameters in Mahout. Chapter 11 explores the distributed side of the Apache Mahout clustering implementations and explains how clustering can be run as Hadoop jobs on large data sets. Finally, chapter 12 uses what you’ve learned from all the other chapters, explores some real-world problems in clustering, and solves them using Apache Mahout.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset