How it works...

Let's look at the options shown in step 3:

Cluster name: This is where you provide an appropriate name for the cluster.
S3 folder: This is the folder location where the S3 bucket's logs for this cluster will go to.
Launch mode:
- Cluster: The cluster will continue to run until you terminate it.
- Step execution: This is to add steps after the application is launched.
Software configuration:
- Vendor: This is Amazon EMI with the open source Hadoop versus MapR's version.
- Release: This is self-evident.
- Applications:
  - Core Hadoop: This is focused on the SQL interface.
  - HBase: This is focused on partial no-SQL-oriented workloads.
  - Presto: This is focused on ad-hoc query processing.
  - Spark: This is focused on Spark.
Hardware configuration:
- Instance type: This topic will be covered in detail in the next section.
- Number of instances: This refers to the number of nodes in the cluster. One of them will be the master node and the rest slave nodes.
Security and access:
- EC2 key pair: You can associate an EC2 key pair with the cluster that you can use to connect to it via SSH.
- Permissions: You can allow other users besides the default Hadoop user to submit jobs.
- EMR role: This allows EMR to call other AWS services, such as EC2, on your behalf.
- EC2 instance profile: This provides access to other AWS services, such as S3 and DynamoDB, via the EC2 instances that are launched by EMR.

Table of Contents for How it works...