Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Running benchmarks to verify the Hadoop installation

The Hadoop distribution comes with several benchmarks . We can use them to verify our Hadoop installation and measure Hadoop's performance. This recipe introduces these benchmarks and explains how to run them.

Getting ready

Start the Hadoop cluster. You can run these benchmarks either on a cluster setup or on a pseudo-distributed setup.

How to do it...

Let us run the sort benchmark. The sort benchmark consists of two jobs. First, we generate some random data using the randomwriter Hadoop job and then sort them using the sort sample.

Change the directory to HADOOP_HOME.
Run the randomwriter Hadoop job using the following command:
```
>bin/hadoop jar hadoop-examples-1.0
.0.jarrandomwriter
-Dtest.randomwrite.bytes_per_map=100
-Dtest.randomwriter.maps_per_host=10 /data/unsorted-data
```
Here the two parameters, test.randomwrite.bytes_per_map and test.randomwriter.maps_per_host specify the size of data generated by a map and the number of maps respectively.

Run the sort program:

>bin/hadoop jar hadoop-examples-1.0.0.jar sort /data/unsorted-data /data/sorted-data

Verify the final results by running the following command:

>bin/hadoop jar hadoop-test-1.0.0.jar testmapredsort -sortInput /data/unsorted-data -sortOutput  /data/sorted-data

Finally, when everything is successful, the following message will be displayed:

The job took 66 seconds.
SUCCESS! Validated the MapReduce framework's 'sort' successfully.

How it works...

First, the randomwriter application runs a Hadoop job to generate random data that can be used by the second sort program. Then, we verify the results through testmapredsort job. If your computer has more capacity, you may run the initial randomwriter step with increased output sizes.

There's more...

Hadoop includes several other benchmarks.

TestDFSIO: This tests the input output (I/O) performance of HDFS
nnbench: This checks the NameNode hardware
mrbench: This runs many small jobs
TeraSort: This sorts a one terabyte of data

More information about these benchmarks can be found at http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Running benchmarks to verify the Hadoop installation

Create new playlist

Sign In

Sign Up

Running benchmarks to verify the Hadoop installation

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Running benchmarks to verify the Hadoop installation