In this chapter, we will cover:
Computing clouds provide on-demand, horizontal, scalable computing resources with no upfront capital investment, making them an ideal environment to perform occasional large -scale Hadoop computations. In this chapter, we will explore several mechanisms to deploy and execute Hadoop MapReduce and Hadoop-related computations on cloud environments.
This chapter discusses how to use Amazon Elastic MapReduce (EMR), the hosted Hadoop infrastructure, to execute traditional MapReduce computations as well as Pig and Hive computations on the Amazon EC2 cloud infrastructure. This chapter also presents how to provision an HBase cluster using Amazon EMR and how to back up and restore the data belonging to an EMR HBase cluster. We will also use Apache Whirr, a cloud neutral library for deploying services on cloud environments, to provision Apache Hadoop and Apache HBase clusters on cloud environments.