Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Installing Mahout

Hadoop provides a framework for implementing large-scale data processing applications. Often, the users implement their applications on MapReduce from scratch or write their applications using a higher-level programming model such as Pig or Hive.

However, implementing some of the algorithms using MapReduce can be very complex. For example, algorithms such as collaborative filtering, clustering, and recommendations need complex code. This is further agitated by the need to maximize parallel executions.

Mahout is an effort to implement well-known machine learning and data mining algorithms using MapReduce framework, so that the users can reuse them in their data processing without having to rewrite them from the scratch. This recipe explains how to install Mahout.

How to do it...

This section demonstrates how to install Mahout.

Download Mahout from https://cwiki.apache.org/confluence/display/MAHOUT/Downloads.
Unzip the mahout distribution by running the following command. We will call this folder MAHOUT_HOME.
```
>tar xvf mahout-distribution-0.6.tar.gz
```

You can run and verify the Mahout installation by carrying out the following steps:

Download the input data from http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data and copy it to MAHOUT_HOME/testdata.

Run the K-mean sample by running the following command:

>bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

If all goes well, it will process and print out the clusters:

12/06/19 21:18:15 INFO kmeans.Job: Running with default arguments
12/06/19 21:18:15 INFO kmeans.Job: Preparing Input
12/06/19 21:18:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
.....
2/06/19 21:19:38 INFO clustering.ClusterDumper: Wrote 6 clusters
12/06/19 21:19:38 INFO driver.MahoutDriver: Program took 83559 ms (Minutes: 1.39265)

How it works...

Mahout is a collection of MapReduce jobs and you can run them using the mahout command. The preceding instructions installed and verified Mahout by running a K-means sample that comes with the Mahout distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Installing Mahout

Create new playlist

Sign In

Sign Up

Installing Mahout

How to do it...

How it works...

Table of Contents for
Installing Mahout