Amazon also provides a Ruby-based Command Line Interface (CLI) for EMR. The EMR Command Line Interface supports creating job flows with multiple steps as well.
This recipe creates a job flow using the EMR CLI to execute the WordCount sample from the Running Hadoop MapReduce computations using Amazon ElasticMapReduce (EMR) recipe of this chapter.
The following steps show you how to create an EMR job flow using the EMR command line interface:
> ruby –v ruby 1.8……
credentials.json
in the directory of the extracted EMR CLI. Fill the fields using the credentials of your AWS account. A sample credentials.json
file is available in the resources/emr-cli
folder of the resource bundle available for this chapter.log_uri
property to store the logging and the debugging information. We assume the S3 bucket name for logging as c10-logs
.{ "access_id": "[Your AWS Access Key ID]", "private_key": "[Your AWS Secret Access Key]", "keypair": "[Your key pair name]", "key-pair-file": "[The path and name of your PEM file]", "log_uri": "s3n://c10-logs/", "region": "us-east-1" }
c10-samples.jar
to the newly created bucket.> ./elastic-mapreduce --create --name "Hello EMR CLI" --jar s3n://[S3 jar file bucket]/c10-samples.jar --arg chapter1.WordCount --arg s3n://[S3 input data path] --arg s3n://[S3 output data path]
The preceding commands will create a job flow and display the job flow ID.
Created job flow x-xxxxxx
<job-flow-id>
using the job flow ID displayed in step 8.>./elastic-mapreduce --describe <job-flow-id> { "JobFlows": [ { "SupportedProducts": [], ………
>./elastic-mapreduce --list x-xxxxxxx STARTING Hello EMR CLI PENDING Example Jar Step ……..
>./elastic-mapreduce --list x-xxxxxx COMPLETED ec2-xxx.amazonaws.com Hello EMR CLI COMPLETED Example Jar Step
You can use EC2 spot instances with your job flows to reduce the cost of your computations. Add a bid price to your request by adding the following commands to your job flow create
command:
>./elastic-mapreduce --create --name …. ......... --instance-group master --instance-type m1.small --instance-count 1 --bid-price 0.01 --instance-group core --instance-type m1.small --instance-count 2 --bid-price 0.01
Refer to the Saving money by using Amazon EC2 Spot Instances to execute EMR job flows recipe in this chapter for more details on Amazon Spot Instances.