Decommissioning DataNodes

There can be multiple situations where you want to decommission one or more data nodes from an HDFS cluster. This recipe shows how to gracefully decommission the DataNodes without incurring data loss and without having to restart the cluster.

How to do it...

The following steps show you how to decommission data nodes gracefully:

  1. If your cluster doesn't have it, add an exclude file to the cluster. Create an empty file in the NameNode and point to it from the conf/hdfs-site.xml file by adding the following property.
    <property>
      <name>dfs.hosts.exclude</name>
      <value>[FULL_PATH_TO_THE_EXCLUDE_FILE]</value>
      <description>Names a file that contains a list of hosts thatare not permitted to connect to the namenode.  The full pathname of the file must be specified.  If the value is empty, no hosts are excluded.</description>
    </property>
  2. Add the hostnames of the nodes that are to be decommissioned to the exclude file.
  3. Run the following command to reload the NameNode configuration. This will start the decommissioning process. The decommissioning process can take a significant time, as it requires replication of data blocks without overwhelming the other tasks of the cluster.
    >bin/hadoop dfsadmin -refreshNodes
    
  4. The decommissioning progress is shown in the HDFS UI under the Decommissioning Nodes page. The decommissioning progress can be monitored using the following command as well. Do not shut down the nodes until the decommissioning is complete.
    >bin/hadoop dfsadmin -report
    .....
    .....
    Name: myhost:50010
    Decommission Status : Decommission in progress
    Configured Capacity: ....
    .....
    
  5. You can remove the nodes from the exclude file and execute the bin/hadoop dfsadmin –refreshNodes command when you want to add the nodes back in to the cluster.
  6. The decommissioning process can be stopped by removing the node's name from the exclude file and then executing the bin/hadoop dfsadmin –refreshNodes command.

How it works...

When a node is in the decommissioning process, HDFS replicates the blocks in that node to the other nodes in the cluster. Decommissioning can be a slow process as HDFS purposely does it slowly to avoid overwhelming the cluster. Shutting down nodes without decommissioning may result in data loss.

After the decommissioning is completed, the nodes mentioned in the exclude file are not allowed to communicate with the NameNode.

See also

  • The Rebalancing HDFS section of the Adding a new node recipe in this chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset