Understanding the AWS analytics suite of services

With the growth of big data and its adoption across organizations on the rise, many cloud providers today provide a plethora of services that are specifically designed to run massive computations and analytics on large volumes of data. AWS is one such cloud provider that also has invested a lot into the big data and analytics paradigm with a host of services offering ready-to-use frameworks, business insights and data warehousing solutions, as well. Here is a brief explanation of the AWS analytics suite of services:

  • Amazon EMR: Amazon Elastic MapReduce or EMR is a quick and easy to use service that provides users with a scalable, managed Hadoop ecosystem and framework. You can leverage EMR to process vast amounts of data without having to worry about configuring the underlying Hadoop platform. We will be learning and exploring more on EMR in the subsequent sections of this chapter.
  • Amazon Athena: Amazon Athena takes big data processing up a notch by providing a standard SQL interface for querying data that is stored directly on Amazon S3. With Athena, you do not have any underlying hardware to manage or maintain; it is all managed by AWS itself. This serverless approach makes Athena ideal for processing data that does not require any complex ETL processing. All you need to do is create a schema, point Athena to your data on Amazon S3, and start querying it using simple SQL syntax.
  • Amazon Elasticsearch Service: Amazon Elasticsearch Service provides a managed deployment of the popular open source search and analytics engine: Elasticsearch. This service comes in really handy when you wish to process streams of data originating from various sources such as logs generated from instances, and so on.
  • Amazon Kinesis: Unlike the other services discussed so far, Amazon Kinesis is more of a streaming service provided by AWS. You can use Amazon Kinesis to push vast amounts of data originating from multiple sources, into one or more streams that can be consumed by other AWS services for performing analytics and other data processing processes.
  • Amazon QuickSight: Amazon QuickSight is an extremely cost-effective business insights solution that can be used to perform fast ad hoc analysis on data.
  • Amazon Redshift: Amazon Redshift is a petabyte-scale data warehousing solution provided by AWS that you can leverage for analyzing your data, using an existing set of tools. We will be learning more about Redshift a bit later during this chapter. The services are depicted here:
  • AWS Data Pipeline: Moving large amounts of data between AWS services can be difficult to perform, especially when the data sources vary. AWS Data Pipeline makes it easier to transfer data between different AWS storage and compute services, as well as helping in the initial transformation and processing of data. You can even use Data Pipeline to transfer data reliably from an on-premise location into AWS storage services, as well.
  • AWS Glue: AWS Glue is a managed ETL (Extract, Transform and Load) service recently launched by AWS. Using AWS Glue greatly simplifies the process of preparing, extracting, and loading data from large datasets into an AWS storage service.

With this brief overview of the AWS analytics suite of services, let's now move forward and get started with understanding a bit more about Amazon EMR!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset