Introducing Amazon EMR

As mentioned earlier, Amazon EMR is a managed service that provides big data analytics frameworks, such as Apache Hadoop and Apache Spark straight out of the box and ready for use. Using Amazon EMR, you can easily perform a variety of use cases such as batch processing, big data analytics, low-latency querying, data streaming, or even use EMR as a large datastore itself!

With Amazon EMR, there is very little underlying infrastructure to manage on your part. You simply have to decide the number of instances you initially want to run your EMR cluster on and start consuming the framework for analytics and processing. Amazon EMR provides you with features that enable you to scale your infrastructure based on your requirements, without affecting the existing setups. Here is a brief look at some of the benefits that you can obtain by leveraging Amazon EMR for your own workloads:

  • Pricing: Amazon EMR relies on EC2 instances to spin up your Apache Hadoop or Apache Spark clusters. Although you can vary costs by selecting the instance types for your cluster from large to extra large and so on, the best part of EMR is that you can also opt between using a combination of on-demand EC2 instances, reserved and spot instances based on your setup, thus providing you with flexibility at significantly lower costs.
  • Scalability: Amazon EMR provides you with a simple way of scaling running workloads, depending on their processing requirements. You can resize your cluster or its individual components as you see fit and additionally, configure one or more instance groups for a guaranteed instance availability and processing.
  • Reliability: Although you, as an end user, have to specify the initial instances and their sizes, AWS ultimately ensures the reliability of the cluster by swapping out instances that either have failed or are going to in the due course of time.
  • Integration: Amazon EMR integrates with the likes of other AWS services to provide your cluster with additional storage, network, and security requirements. You can use services such as Amazon S3 to store both the input as well as the output data, AWS CloudTrail for auditing the requests made to EMR, VPC to ensure the security of your launched EMR instances and much more!

With these details in mind, let's move an inch closer to launching our very own EMR cluster by first visiting some of its key concepts and terminologies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset