In its default configuration, Hadoop starts a new JVM for each map or reduce task. However, running multiple tasks from the same JVM can sometimes significantly speed up the execution. This recipe explains how to control this behavior.
>bin/hadoop jar hadoop-examples-1.0.0.jar wordcount –Dmapred.job.reuse.jvm.num.tasks=-1 /data/input1 /data/output1
ps –ef|grephadoop
command in Unix or task manager in Windows). Hadoop starts only a single JVM per task slot and then reuses it for an unlimited number of tasks in the job.However, passing arguments through the –D
option only works if the job implements the org.apache.hadoop.util.Tools
interface. Otherwise, you should set the option through the JobConf.setNumTasksToExecutePerJvm(-1)
method.