Chapter 10. Cloud Deployments: Using Hadoop on Clouds

In this chapter, we will cover:

  • Running Hadoop MapReduce computations using Amazon Elastic MapReduce (EMR)
  • Saving money using Amazon EC2 Spot Instances to execute EMR job flows
  • Executing a Pig script using EMR
  • Executing a Hive script using EMR
  • Creating an Amazon EMR job flow using the Command Line Interface
  • Deploying an Apache HBase Cluster on Amazon EC2 cloud using EMR
  • Using EMR Bootstrap actions to configure VMs for the Amazon EMR jobs
  • Using Apache Whirr to deploy an Apache Hadoop cluster in a cloud environment
  • Using Apache Whirr to deploy an Apache HBase cluster in a cloud environment

Introduction

Computing clouds provide on-demand, horizontal, scalable computing resources with no upfront capital investment, making them an ideal environment to perform occasional large -scale Hadoop computations. In this chapter, we will explore several mechanisms to deploy and execute Hadoop MapReduce and Hadoop-related computations on cloud environments.

This chapter discusses how to use Amazon Elastic MapReduce (EMR), the hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as Pig and Hive computations on the Amazon EC2 cloud infrastructure. This chapter also presents how to provision an HBase cluster using Amazon EMR and how to back up and restore the data belonging to an EMR HBase cluster. We will also use Apache Whirr, a cloud neutral library for deploying services on cloud environments, to provision Apache Hadoop and Apache HBase clusters on cloud environments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset