Running benchmarks to verify the Hadoop installation

The Hadoop distribution comes with several benchmarks . We can use them to verify our Hadoop installation and measure Hadoop's performance. This recipe introduces these benchmarks and explains how to run them.

Getting ready

Start the Hadoop cluster. You can run these benchmarks either on a cluster setup or on a pseudo-distributed setup.

How to do it...

Let us run the sort benchmark. The sort benchmark consists of two jobs. First, we generate some random data using the randomwriter Hadoop job and then sort them using the sort sample.

  1. Change the directory to HADOOP_HOME.
  2. Run the randomwriter Hadoop job using the following command:
    >bin/hadoop jar hadoop-examples-1.0
    .0.jarrandomwriter
    -Dtest.randomwrite.bytes_per_map=100
    -Dtest.randomwriter.maps_per_host=10 /data/unsorted-data
    

    Here the two parameters, test.randomwrite.bytes_per_map and test.randomwriter.maps_per_host specify the size of data generated by a map and the number of maps respectively.

  3. Run the sort program:
    >bin/hadoop jar hadoop-examples-1.0.0.jar sort /data/unsorted-data /data/sorted-data
    
  4. Verify the final results by running the following command:
    >bin/hadoop jar hadoop-test-1.0.0.jar testmapredsort -sortInput /data/unsorted-data -sortOutput  /data/sorted-data
    

Finally, when everything is successful, the following message will be displayed:

The job took 66 seconds.
SUCCESS! Validated the MapReduce framework's 'sort' successfully.

How it works...

First, the randomwriter application runs a Hadoop job to generate random data that can be used by the second sort program. Then, we verify the results through testmapredsort job. If your computer has more capacity, you may run the initial randomwriter step with increased output sizes.

There's more...

Hadoop includes several other benchmarks.

  • TestDFSIO: This tests the input output (I/O) performance of HDFS
  • nnbench: This checks the NameNode hardware
  • mrbench: This runs many small jobs
  • TeraSort: This sorts a one terabyte of data

More information about these benchmarks can be found at http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset