Setting failure percentages and skipping bad records

When processing a large amount of data, there may be cases where a small amount of map tasks will fail, but still the final results make sense without the failed map tasks. This could happen due to a number of reasons such as:

  • Bugs in the map task
  • Small percentage of data records are not well formed
  • Bugs in third-party libraries

In the first case, it is best to debug, find the cause for failures, and fix it. However, in the second and third cases, such errors may be unavoidable. It is possible to tell Hadoop that the job should succeed even if some small percentage of map tasks have failed.

This can be done in two ways:

  • Setting the failure percentages
  • Asking Hadoop to skip bad records

This recipe explains how to configure this behavior.

Getting ready

Start the Hadoop setup. Refer to the Setting Hadoop in a distributed cluster environment recipe from the Chapter 1, Getting Hadoop Up and Running in a Cluster.

How to do it...

Run the WordCount sample by passing the following options:

>bin/hadoop jar hadoop-examples-1.0.0.jar wordcount
-Dmapred.skip.map.max.skip.records=1
-Dmapred.skip.reduce.max.skip.groups=1 /data/input1 /data/output1

However, this only works if the job implements the org.apache.hadoop.util.Tools interface. Otherwise, you should set it through JobConf.set(name, value).

How it works...

Hadoop does not support skipping bad records by default. We can turn on bad record skipping by setting the following parameters to positive values:

  • mapred.skip.map.max.skip.records: This sets the number of records to skip near a bad record, including the bad record
  • mapred.skip.reduce.max.skip.groups: This sets the number of acceptable skip groups surrounding a bad group

There's more...

You can also limit the percentage of failures in map or reduce tasks by setting the JobConf.setMaxMapTaskFailuresPercent(percent) and JobConf.setMaxReduceTaskFailuresPercent(percent) options.

Also, Hadoop repeats the tasks in case of a failure. You can control that through JobConf.setMaxMapAttempts(5).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset