Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Benchmarking HDFS

Running benchmarks is a good way to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of a HDFS cluster. This recipe shows how to use DFSIO to benchmark the read and write performance of a HDFS cluster.

Getting ready

You must set up and deploy HDFS and Hadoop MapReduce prior to running these benchmarks. Export the HADOOP_HOME environment variable to point to your Hadoop installation root directory:

>export HADOOP_HOME=/../hadoop-1.0.4

The benchmark programs are in the $HADOOP_HOME/hadoop-*test.jar file.

How to do it...

The following steps show you how to run the write performance benchmark:

To run the write performance benchmark, execute the following command in the $HADOOP_HOME directory. The –nrFiles parameter specifies the number of files and the -fileSize parameter specifies the file size in MB.
```
>bin/hadoop jar $HADOOP_HOME/hadoop-test-*.jar TestDFSIO -write -nrFiles 5 –fileSize 100
```
The benchmark writes to the console, as well as appends to a file named TestDFSIO_results.log. You can provide your own result file name using the –resFile parameter.

The following steps show you how to run the read performance benchmark:

The read performance benchmark uses the files written by the write benchmark in step 1. Hence, the write benchmark should be executed before running the read benchmark and the files written by the write benchmark should exist in the HDFS for the read benchmark to work.
Execute the following command to run the read benchmark. Benchmark writes the results to the console and appends the results to a logfile similarly to the write benchmark.
```
>bin/hadoop jar $HADOOP_HOME/hadoop-test-*.jar TestDFSIO -read -nrFiles5 –fileSize 100
```

To clean the files generated by these benchmarks, use the following command:

>bin/hadoop jar $HADOOP_HOME hadoop-test-*.jar TestDFSIO -clean

How it works...

DFSIO executes a MapReduce job where the map tasks write and read the files in parallel, while the reduce tasks are used to collect and summarize the performance numbers.

There's more...

Running these tests together with monitoring systems can help you identify the bottlenecks much more easily.

Table of Contents for
Benchmarking HDFS

Benchmarking HDFS

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for Benchmarking HDFS

Create new playlist

Sign In

Sign Up

Benchmarking HDFS

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for
Benchmarking HDFS