About this Book

You may be wondering—is this a book for me?

If you are seeking a textbook on machine learning, no. This book does not attempt to fully explain the theory and derivation of the various algorithms and techniques presented here. Some familiarity with machine learning techniques and related concepts, like matrix and vector math, is useful in reading this book, but not assumed.

If you are developing modern, intelligent applications, then the answer is, yes. This book provides a practical rather than a theoretical treatment of these techniques, along with complete examples and recipes for solutions. It develops some insights gleaned by experienced practitioners in the course of demonstrating how Mahout can be deployed to solve problems.

If you are a researcher in artificial intelligence, machine learning, and related areas—yes. Chances are your biggest obstacle is translating new algorithms into practice. Mahout provides a fertile framework and collection of patterns and ready-made components for testing and deploying new large-scale algorithms. This book is an express ticket to deploying machine learning systems on top of complex distributed computing frameworks.

If you are leading a product team or startup that will leverage machine learning to create a competitive advantage, then yes, this book is also for you. Through real-world examples, it will plant ideas about the many ways these techniques can be deployed. It will also help your scrappy technical team jump directly to a cost-effective implementation that can handle volumes of data previously only realistic for organizations with large technology resources.

Roadmap

This book is divided into three parts, covering collaborative filtering, clustering, and classification in Apache Mahout, respectively.

First, chapter 1 introduces Apache Mahout as a whole. This chapter will get you set up for all of the chapters that follow.

Part 1, which includes chapters 2 through 6, is presented by Sean Owen; it covers collaborative filtering and recommendation. Chapter 2 gives you a first chance to try a Mahout-based recommender engine and evaluate its performance. Chapter 3 discusses how you can represent the data that recommenders use in an efficient way. Then, chapter 4 presents all of the recommender algorithms available in Mahout and compares their strengths and weaknesses. Given that background, chapter 5 presents a case study in which you’ll apply the recommender implementations introduced in chapter 4 to a real-world problem, adapt to some particular properties of the data, and create a production-ready recommender engine. Chapter 6 then introduces Apache Hadoop and gives you a first look at machine learning algorithms in a distributed environment by studying a recommender engine based on Hadoop.

Part 2 of the book, including chapters 7 through 12, explores clustering algorithms in Apache Mahout. With the techniques described in this part by Robin Anil, you can group together similar-looking pieces of data into a set or a cluster. Clustering helps uncover interesting groups of information in a large volume of data. This part begins with simple problems in clustering, with examples written in Java. It then introduces more real-world examples and shows how you can make Apache Mahout run as Hadoop jobs that can cluster large amounts of data easily.

Finally, in part 3, Ted Dunning and Ellen Friedman explore classification with Mahout in chapters 13 through 17. You will first learn how to build and train a classifier model by “teaching” an algorithm with a series of examples. Then you will learn how to evaluate and fine tune a classifier’s model to give better answers. This part concludes with a real-world case study of classification in action.

Code conventions and downloads

Source code in this book is printed in a monospaced font, called out in listings, and annotated with notes about important points. The code listings are intended to be brief and show only essentials. They will not generally show Java imports, class declarations, Java annotations, and other elements that are not essential to the discussion of the code.

Class names in this book are generally printed in a monospaced font, inline with the text, to indicate they are classes that can be located and studied within the Apache Mahout source code. For example, LogLikelihoodSimilarity is a Java class in Mahout.

Some listings show commands that can be executed. These are written for Unix-like environments such as Mac OS X and Linux distributions. They should work on Microsoft Windows if executed through the Unix-like Cygwin environment.

Compilable copies of the source code in key listings throughout the book are available for download from the publisher’s website at www.manning.com/MahoutinAction. These are standalone Java source files and do not include a build script. For simplicity, they can be unpacked and added into a copy of the complete Mahout source distribution under the examples/src/java/main directory. The existing Mahout build environment will then be able to compile the code automatically.

Multimedia extras

All four authors have recorded audio and video segments that accompany specific sections in most of the chapters and provide additional information on selected topics. These segments can be activated in the ebook version of Mahout in Action, which is available for free for all owners of the print book, or you can access them for free from the publisher’s website at www.manning.com/MahoutinAction/extras. On the printed pages, audio and video icons indicate the topics covered and who is speaking in each segment. Please refer to a full list of these extras that begins on page xxiii.

Author Online

The purchase of Mahout in Action includes free access to a private forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the authors and other users. You can access and subscribe to the forum at www.manning.com/MahoutinAction. This page provides information on how to get on the forum once you’re registered, what kind of help is available, and the rules of conduct in the forum.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It isn’t a commitment to any specific amount of participation on the part of the authors, whose contributions to the book’s forum remain voluntary (and unpaid). We suggest you try asking the authors some challenging questions, lest their interest stray!

The Author Online forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset