How it works...

Let's look at the options shown in step 3:

  • Cluster name: This is where you provide an appropriate name for the cluster.
  • S3 folder: This is the folder location where the S3 bucket's logs for this cluster will go to.
  • Launch mode
    • Cluster: The cluster will continue to run until you terminate it.
    • Step execution: This is to add steps after the application is launched.
  • Software configuration:
    • Vendor: This is Amazon EMI with the open source Hadoop versus MapR's version.
    • Release: This is self-evident.
    • Applications
      • Core Hadoop: This is focused on the SQL interface.
      • HBase: This is focused on partial no-SQL-oriented workloads.
      • Presto: This is focused on ad-hoc query processing.
      • Spark: This is focused on Spark.
  • Hardware configuration
    • Instance type: This topic will be covered in detail in the next section.
    • Number of instances: This refers to the number of nodes in the cluster. One of them will be the master node and the rest slave nodes.
  • Security and access:
    • EC2 key pair: You can associate an EC2 key pair with the cluster that you can use to connect to it via SSH.
    • Permissions: You can allow other users besides the default Hadoop user to submit jobs.
    • EMR role: This allows EMR to call other AWS services, such as EC2, on your behalf.
    • EC2 instance profile: This provides access to other AWS services, such as S3 and DynamoDB, via the EC2 instances that are launched by EMR.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset