Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Reusing Java VMs to improve the performance

In its default configuration, Hadoop starts a new JVM for each map or reduce task. However, running multiple tasks from the same JVM can sometimes significantly speed up the execution. This recipe explains how to control this behavior.

How to do it...

Run the WordCount sample by passing the following option as an argument:

>bin/hadoop jar hadoop-examples-1.0.0.jar wordcount –Dmapred.job.reuse.jvm.num.tasks=-1 /data/input1 /data/output1

Monitor the number of processes created by Hadoop (through ps –ef|grephadoop command in Unix or task manager in Windows). Hadoop starts only a single JVM per task slot and then reuses it for an unlimited number of tasks in the job.
However, passing arguments through the –D option only works if the job implements the org.apache.hadoop.util.Tools interface. Otherwise, you should set the option through the JobConf.setNumTasksToExecutePerJvm(-1) method.

How it works...

By setting the job configuration property through mapred.job.reuse.jvm.num.tasks, we can control the number of tasks for the JVM run by Hadoop. When the value is set to -1, Hadoop runs the tasks in the same JVM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Reusing Java VMs to improve the performance

Create new playlist

Sign In

Sign Up

Reusing Java VMs to improve the performance

How to do it...

How it works...

Table of Contents for
Reusing Java VMs to improve the performance