When processing a large amount of data, there may be cases where a small amount of map tasks will fail, but still the final results make sense without the failed map tasks. This could happen due to a number of reasons such as:
In the first case, it is best to debug, find the cause for failures, and fix it. However, in the second and third cases, such errors may be unavoidable. It is possible to tell Hadoop that the job should succeed even if some small percentage of map tasks have failed.
This can be done in two ways:
This recipe explains how to configure this behavior.
Start the Hadoop setup. Refer to the Setting Hadoop in a distributed cluster environment recipe from the Chapter 1, Getting Hadoop Up and Running in a Cluster.
Run the WordCount sample by passing the following options:
>bin/hadoop jar hadoop-examples-1.0.0.jar wordcount -Dmapred.skip.map.max.skip.records=1 -Dmapred.skip.reduce.max.skip.groups=1 /data/input1 /data/output1
However, this only works if the job implements the org.apache.hadoop.util.Tools
interface. Otherwise, you should set it through JobConf.set(name, value)
.
Hadoop does not support skipping bad records by default. We can turn on bad record skipping by setting the following parameters to positive values:
mapred.skip.map.max.skip.records
: This sets the number of records to skip near a bad record, including the bad recordmapred.skip.reduce.max.skip.groups
: This sets the number of acceptable skip groups surrounding a bad groupYou can also limit the percentage of failures in map or reduce tasks by setting the JobConf.setMaxMapTaskFailuresPercent(percent)
and JobConf.setMaxReduceTaskFailuresPercent(percent)
options.
Also, Hadoop repeats the tasks in case of a failure. You can control that through JobConf.setMaxMapAttempts(5)
.