Using multiple disks/volumes and limiting HDFS disk usage

Hadoop supports specifying multiple directories for DataNode data directory. This feature allows us to utilize multiple disks/volumes to store the data blocks in DataNodes. Hadoop will try to store equal amounts of data in each directory. Hadoop also supports limiting the amount of disk space used by HDFS.

How to do it...

The following steps will show you how to add multiple disk volumes:

  1. Create HDFS data storage directories in each volume.
  2. In the $HADOOP_HOME/conf/hdfs-site.xml, provide a comma-separated list of directories corresponding to the data storage locations in each volume under the dfs.data.dir directory.
    <property>
      <name>dfs.data.dir</name>
      <value>/u1/hadoop/data,/u2/hadoop/data</value>
    </property>
  3. To limit the HDFS disk usage, add the following property to $HADOOP_HOME/conf/hdfs-site.xml to reserve space for non-DFS usage. The value specifies the number of bytes that HDFS cannot use per volume.
    <property>
      <name>dfs.datanode.du.reserved</name>
      <value>6000000000</value>
      <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
      </description>
    </property>
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset