Chapter 2. Exploring Any Data

QuickSight can analyze data from various sources including Amazon Web Services (AWS) data stores, files in common formats, Salesforce, and popular database engines. QuickSight has a simple interface to connect to these sources and create datasets from them that can be stored in SPICE for subsequent analysis. In this chapter, we will first look at Amazon's big data ecosystem and then review how QuickSight can be used to connect to the various data stores. The following topics will be covered:

  • Amazon's big data ecosystem
  • QuickSight-supported data sources
  • QuickSight-supported data types and data sizes
  • Use case review
  • Uploading your own data from files, RDBMS, and SaaS to QuickSight
  • Editing existing datasets
  • Uploading data using Athena

AWS big data ecosystem

Amazon's big data ecosystem has several software services that enable business insights from data. These services can be broadly classified into four major categories - Collect, Store, Analyze, and Orchestrate, as shown in the following diagram:

AWS big data ecosystem

Figure 2.1: AWS big data ecosystem

Let's look at each category in detail.

Collect

The first step for any BI initiative is to collect data from external systems to Amazon for which AWS has the following services:

  • Direct connect: With direct connect, you can establish private connectivity between AWS and your enterprise data center and provide an easy way to move data files from your applications to AWS for analysis
  • Snowball: Snowball (also known as Import/Export) lets you import hundreds of terabytes of data quickly into AWS using Amazon-provided, secure appliances for secure transport
  • Kinesis and Kinesis Firehose: Kinesis services enable building custom applications that process or analyze streaming data

Store

The data collected needs to be stored and Amazon offers several options, which you can pick and choose, based on latency and budget requirements. Following is a summary:

  • S3: Amazon Simple Storage Service (S3) can be used to store and retrieve any amount of data. It is an object store and very reliable.
  • Glacier: Glacier is an extremely low-cost storage service that provides secure, durable, and flexible storage for data backup and archival with low cost (1 cent per GB per month).
  • RDS and Aurora: RDS services enables easy setup for the most commonly used relational databases in AWS including Oracle, MySQL, SQLServer, and Postgres and manages the time-consuming administration tasks of backup. The Aurora service is a MySQL compatible service at a fraction of the RDS cost.
  • Redshift: The Redshift service provides a fast, full-managed data warehouse for a low cost ($1,000 per TB per year).

Analyze

Once data is in Amazon, we have several options to analyze data. Following is a summary:

  • EMR: Amazon EMR provides a managed Hadoop framework that makes it an easy, fast, and cost-effective way to process a vast amount of data at scale and on-demand.
  • Machine learning: Machine learning provides visualization tools and wizards for creating machine learning models and execute them on your big data.
  • QuickSight: QuickSight is the fast, cloud-powered BI service and the theme of this book.
  • Athena: It is a query service that makes it easy to analyze data directly from files in S3 using standard SQL statements. Athena is server-less, which makes it really stand out since there is no additional infrastructure to be provisioned.

Orchestrate

To move, orchestrate, and integrate data between the various AWS stores, Amazon has two key products; Data Pipeline and Glue. The following is a summary of these products:

  • Data Pipeline: Amazon Data Pipeline allows reliable data movement from different AWS compute and storage services, as well as on-premise data sources at specified intervals.
  • Glue: Glue is a fully managed ETL service (launched Dec 2016) with a data catalog. It crawls data sources, identifies data formats, allows transformations to be built using an IDE, and schedules these jobs.

This completes the AWS big data ecosystem overview. Next, let's look at how to onboard data to QuickSight in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset