Running benchmarks is a good way to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of a HDFS cluster. This recipe shows how to use DFSIO to benchmark the read and write performance of a HDFS cluster.
You must set up and deploy HDFS and Hadoop MapReduce prior to running these benchmarks. Export the HADOOP_HOME
environment variable to point to your Hadoop installation root directory:
>export HADOOP_HOME=/../hadoop-1.0.4
The benchmark programs are in the $HADOOP_HOME/hadoop-*test.jar
file.
The following steps show you how to run the write performance benchmark:
$HADOOP_HOME
directory. The –nrFiles
parameter specifies the number of files and the -fileSize
parameter specifies the file size in MB.>bin/hadoop jar $HADOOP_HOME/hadoop-test-*.jar TestDFSIO -write -nrFiles 5 –fileSize 100
TestDFSIO_results.log
. You can provide your own result file name using the –resFile
parameter.The following steps show you how to run the read performance benchmark:
>bin/hadoop jar $HADOOP_HOME/hadoop-test-*.jar TestDFSIO -read -nrFiles5 –fileSize 100
To clean the files generated by these benchmarks, use the following command:
>bin/hadoop jar $HADOOP_HOME hadoop-test-*.jar TestDFSIO -clean
DFSIO executes a MapReduce job where the map tasks write and read the files in parallel, while the reduce tasks are used to collect and summarize the performance numbers.
Running these tests together with monitoring systems can help you identify the bottlenecks much more easily.