Fault tolerance and speculative execution

The primary advantage of using Hadoop is its support for fault tolerance. When you run a job, especially a large job, parts of the execution can fail due to external causes such as network failures, disk failures, and node failures.

When a job has been started, Hadoop JobTracker monitors the TaskTrackers to which it has submitted the tasks of the job. If any TaskTrackers are not responsive, Hadoop will resubmit the tasks handled by unresponsive TaskTracker to a new TaskTracker.

Generally, a Hadoop system may be compose of heterogeneous nodes, and as a result there can be very slow nodes as well as fast nodes. Potentially, a few slow nodes can slow down an execution significantly.

To avoid this, Hadoop supports speculative executions. This means if most of the map tasks have completed and Hadoop is waiting for a few more map tasks, Hadoop JobTracker will start these pending jobs also in a new node. The tracker will use the results from the first task that finishes and stop any other identical tasks.

However, the above model is feasible only if the map tasks are side-effects free. If such parallel executions are undesirable, Hadoop lets users turn off speculative executions.

How to do it...

Run the WordCount sample by passing the following option as an argument to turn off the speculative executions:

bin/hadoop jar hadoop-examples-1.0.0.jar wordcount–Dmapred.map.tasks.speculative.execution=false –D mapred.reduce.tasks.speculative.execution=true /data/input1 /data/output1

However, this only works if the job implements the org.apache.hadoop.util.Tools interface. Otherwise, you should set the parameter through JobConf.set(name, value).

How it works...

When the option is specified and set to false, Hadoop will turn off the speculative executions. Otherwise, it will perform speculative executions by default.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset