Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using the Hadoop Tool interface

Often Hadoop jobs are executed through a command line. Therefore, each Hadoop job has to support reading, parsing, and processing command-line arguments. To avoid each developer having to rewrite this code, Hadoop provides a org.apache.hadoop.util.Tool interface.

How to do it...

In the source code for this chapter, the src/chapter3/WordcountWithTools.java class extends the WordCount example with support for the Tool interface.

public class WordcountWithTools extends  
   Configured implements Tool
{
  public int run(String[] args) throws Exception
  {
    if (args.length< 2)
    {
      System.out.println("chapter3.WordCountWithTools 
      WordCount<inDir><outDir>");
      ToolRunner.printGenericCommandUsage(System.out);
      System.out.println("");
      return -1;
    }
   Job job = new Job(getConf(), "word count");
   job.setJarByClass(WordCount.class);
   job.setMapperClass(TokenizerMapper.class);
   job.setReducerClass(IntSumReducer.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));
   job.waitForCompletion(true);
   return 0;
 }
  public static void main(String[] args)
    throws Exception
  {
    int res = ToolRunner.run(
       new Configuration(), new WordcountWithTools(), args);
    System.exit(res);
  }

Set up a input folder in HDFS with /data/input/README.txt if it doesn't already exist. It can be done through following commands:
```
bin/hadoopfs -mkdir /data/output
bin/hadoopfs -mkdir /data/input
bin/hadoopfs -put README.txt /data/input
```

Try to run the WordCount without any options, and it will list the available options.

bin/hadoop jar hadoop-cookbook-chapter3.jar chapter3.WordcountWithToolsWordcount <inDir><outDir>
Generic options supported are
-conf<configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs<local|namenode:port>      specify a namenode
-jt<local|jobtracker:port>    specify a job tracker
-files<comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars<comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives<comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Run the WordCount sample with the mapred.job.reuse.jvm.num.tasks option to limit the number of JVMs created by the job, as we learned in an earlier recipe.
```
bin/hadoop jar hadoop-cookbook-chapter3.jar
chapter3.WordcountWithTools
-D mapred.job.reuse.jvm.num.tasks=1  /data/input /data/output
```

How it works...

When a job extends from the Tool interface, Hadoop will intercept the command-line arguments, parse the options, and configure the JobConf object accordingly. Therefore, the job will support standard generic options.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Using the Hadoop Tool interface

Create new playlist

Sign In

Sign Up

Using the Hadoop Tool interface

How to do it...

How it works...

Table of Contents for
Using the Hadoop Tool interface