Erasure Coding

EC is a key change in Hadoop 3.x promising a significant improvement in HDFS utilization
efficiencies as compared to earlier versions where replication factor of 3 for instance caused
immense wastage of precious cluster file system for all kinds of data no matter what the relative importance was to the tasks at hand. 

EC can be setup using policies and assigning the policies to directories in HDFS. For this, HDFS provides an ec subcommand to perform administrative commands related to EC:

hdfs ec [generic options]
[-setPolicy -path <path> [-policy <policyName>] [-replicate]]
[-getPolicy -path <path>]
[-unsetPolicy -path <path>]
[-listPolicies]
[-addPolicies -policyFile <file>]
[-listCodecs]
[-enablePolicy -policy <policyName>]
[-disablePolicy -policy <policyName>]
[-help [cmd ...]]

The following are the details of each command:

  • [-setPolicy -path <path> [-policy <policyName>] [-replicate]]: Sets an EC policy on a directory at the specified path.
    • path: An directory in HDFS. This is a mandatory parameter. Setting a policy only affects newly created files, and does not affect existing files.
    • policyName: The EC policy to be used for files under this
      directory. This parameter can be omitted if a  dfs.namenode.ec.system.default.policy configuration is set. The EC policy of the path will be set with the default value in configuration.
    • -replicate: Apply the special REPLICATION policy on the directory, force the directory to adopt 3x replication scheme.
    • -replicate and -policy <policyName>: These are optional arguments. They cannot be specified at the same time.
  • [-getPolicy -path <path>]: Get details of the EC policy of a file or directory
    at the specified path.
  • [-unsetPolicy -path <path>]: Unset an EC policy set by a previous call to setPolicy on a directory. If the directory inherits the EC policy from an ancestor directory, unsetPolicy is a no-op. Unsetting the policy on a directory which doesn't have an explicit policy set will not return an error.
  • [-listPolicies]: Lists all (enabled, disabled and removed) EC policies
    registered in HDFS. Only the enabled policies are suitable for use with the setPolicy command.
  • [-addPolicies -policyFile <file>]: Add a list of EC policies. Please refer
    etc/hadoop/user_ec_policies.xml.template for the example policy file.
    The maximum cell size is defined in property
    dfs.namenode.ec.policies.max.cellsize with the default value 4 MB.
    Currently HDFS allows the user to add 64 policies in total, and the added policy ID is in range of 64 to 127. Adding policy will fail if there are already 64 policies added.
  • [-listCodecs]: Get the list of supported EC codecs and coders in system. A
    coder is an implementation of a codec. A codec can have different implementations, thus different coders. The coders for a codec are listed in a fall
    back order.
  • [-removePolicy -policy <policyName>]: It removes an EC policy
  • [-enablePolicy -policy <policyName>]: It enables an EC policy
  • [-disablePolicy -policy <policyName>]: It disables an EC policy

By using -listPolicies, you can list all the EC policies currently setup in your cluster
along with the state of such policies whether they are ENABLED or DISABLED:

Lets test out EC in our cluster. First we will create directories in the HDFS shown as follows:
./bin/hdfs dfs -mkdir /user/normal
./bin/hdfs dfs -mkdir /user/ec

Once the two directories are created then you can set the policy on any path:

./bin/hdfs ec -setPolicy -path /user/ec -policy RS-6-3-1024k
Set RS-6-3-1024k erasure coding policy on /user/ec

Now copying any content into the /user/ec folder falls into the newly set policy.

Type the command shown as follows to test this:

./bin/hdfs dfs -copyFromLocal ~/Documents/OnlineRetail.csv /user/ec

The following screenshot shows the result of the copying, as expected the system complains as we don't really have a cluster on our local system enough to implement EC. But this should give us an idea of what is needed and how it would look:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset