Preface

The path to here, for me (Sean), began in 2005. A friend was starting a company that would lean heavily on collaborative filtering. There were mature, open source packages for this purpose at the time, but they seemed in some ways too elaborate for simple use cases, and in other ways they seemed built for research purposes. For better or worse, I instead prototyped a simple recommender for my friend’s startup, from scratch. The startup, unfortunately, cancelled itself. Nevertheless, I couldn’t bring myself to delete the prototype. It was certainly interesting, so I cleaned and documented it and released it as an open source project called Taste.

Nothing happened for a year. In my spare time, I added pieces and fixed problems, and then a user or two popped up with bugs and patches—and a few more, and then several more. By 2008, there was a small but unmistakable user base out there. And the Apache Lucene folks who had just spun off machine-learning-related efforts into Apache Mahout suggested we merge. This book project began in late 2009. I find myself surprised and pleased to still be rolling along with this growing snowball of a project in 2011 as it’s beginning to be used by large companies in production.

So, I’m only accidentally here. While I have been a senior engineer, formerly at Google, nobody would mistake me for a expert researcher in the field. I am more like a museum curator than a painter—collecting, organizing, and packaging for wider use the great ideas of a field. It turns out that’s useful work too.

Someone recently described the book, after reading a draft, as a “pop” machine learning book. It was meant as a compliment, and I couldn’t agree more. Machine learning is a bit of magic, though much of the research-oriented writing on the subject can look like arcane spells to anyone but the specialist, and can seem divorced from the reality of applying the techniques. Mahout in Action aims to be accessible, to unearth the interesting nuggets of insight for the enthusiast, and to save the practitioner time in getting work done. I hope it provides you more “a-ha!” moments than “wha...?” moments.

SEAN OWEN

My (Robin’s) interest in machine learning started during my days in college, back in 2006. At that time, I was working as an intern with a group of people designing a personalized recommendation engine. That group flourished and became a company called Minekey; I was invited to join as one of its core developers. The next four years of my life were spent implementing and experimenting with machine learning techniques. Somewhere along that path, I stumbled across Mahout and started contributing as a Google Summer of Code student. The next thing I knew, I was contributing algorithms and patches to its codebase, tuning and optimizing performance, and helping other folks on the mailing list.

I am really fortunate to be part of a wonderful and growing community of developers, researchers, and enthusiasts of machine learning. As more and more companies are adopting Mahout, it is becoming a mainstream library of machine learning. I really hope you enjoy reading this book.

ROBIN ANIL

I (Ted) came to the application side of projects from research in machine learning. Formerly an academic, I have subsequently been involved in a number of startups, and I have applied machine learning to all of these practical application settings.

Previously, I (Ellen) worked in research laboratories in biochemistry and molecular biology. In addition to having lots of experience with data, I’ve written extensively on technical subjects. Throughout it all, I’ve remained fascinated by data and how it speaks to us. I have tried to bring this insight to Mahout in Action.

Both of us see that open source only works with input from an active and broad community of participants. A major part of Mahout’s success comes from those who have used the software and brought their experience back to the project via discussions in mailing lists, bug fixes, and suggestions.

For this reason, Mahout in Action not only provides useful explanations of code, but also guidance regarding the concepts behind the code. This introduction to the framework behind the code will enable you to effectively join in and benefit from the interactive Mahout discussion. We hope this book not only helps the readers of this book, but also helps to expand and enrich Mahout itself.

TED DUNNING AND ELLEN FRIEDMAN

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset