Loading MySQL data to QuickSight using the AWS pipeline

In this section, we will look into the data flow path from https://www.ed.gov/ to QuickSight that uses a MySQL database. The source data is obtained from the public site and provides information about colleges in the USA.

The path to get to QuickSight involves the following steps:

  1. Uploading data to S3.
  2. Creating an AWS Data Pipeline to load data from S3 to MySQL.
  3. Loading data from MySQL to QuickSight.

Pre-requisites

The following are the pre-requisites to load data from MySQL to QuickSight:

  • Must have an RDS instance created. In this example, I will show an RDS MySQL instance.
  • Data file must be CSV. It should not contain any header rows.
  • You must have a database username and password that can connect to the database from QuickSight with the SELECT permission on some system tables so that QuickSight can estimate the table size. The following table identifies the system tables that the user account needs permission to select:

    Database type

    Access needed for tables

    MySQL

    INFORMATION_SCHEMA.STATISTICS

    INFORMATION_SCHEMA.TABLES

    PostgreSQL

    pg_stats

    pg_class

    pg_namespace

Uploading data to S3

From the public site, download the data file to your local system and upload to S3. This process is the same as described in the Loading text files to QuickSight section. Website link: https://catalog.data.gov/dataset/college-scorecard.

Uploading data to S3

Figure 2.10: College scorecard data (data.gov)

Creating and executing the AWS Data Pipeline

Next, to load the data from S3 to an RDS MySQL instance, we will use the AWS Data Pipeline service, which makes it easy to build ETL pipelines with a web interface. From the Services list, click on Data Pipeline, and then, click on Create new pipeline, as shown in the following screenshot:

Creating and executing the AWS Data Pipeline

Figure 2.11: AWS pipeline creation

The next screenshot shows the Create Pipeline form, enter the details as follows:

  • Name and Description: Provide an appropriate name and description (optional).
  • Source: Select from the drop-down menu, Load S3 data into RDS MySQL table.
  • Parameters: Provide the appropriate MySQL credentials, S3 location, insert query, MySQL table name, and the create table query.

    Creating and executing the AWS Data Pipeline

    Figure 2.12: Load S3 to MySQL pipeline

    Note

    For reference, you can import the pipeline that I have saved in GitHub at this location: https://github.com/rnadipalli/QuickSight/blob/master/awsdatapipelines/LoadS3toMySQLPipeline.json.

  • Next, provide schedule information or click on Activate to kick off the pipeline, as shown in the following screenshot:

    Creating and executing the AWS Data Pipeline

    Figure 2.13: Load S3 to MySQL pipeline schedule

  • Optionally, you can view the pipeline in a graphical form if you click on Edit in Architect as follows:

    Creating and executing the AWS Data Pipeline

    Figure 2.14: S3 to MySQL pipeline architect

  • AWS will now execute this pipeline on an EC2 instance that you have specified at configuration. The following screenshot shows the monitoring page. If the execution has any failures, you can click on details to view the error message.

    Creating and executing the AWS Data Pipeline

    Figure 2.15: Pipeline execution

Creating a new QuickSight dataset from MySQL

Now that data is in the RDS MySQL instance, follow these steps to create a new dataset in QuickSight:

  1. From the QuickSight home page, click on Manage data.
  2. Next, select New data set and then select RDS option.
  3. Next, enter details of the RDS data source, as shown in the following screenshot, and then click on Connect:

    Creating a new QuickSight dataset from MySQL

    Figure 2.16: Load RDS MySQL to SPICE

  4. You will next get a list of tables for that database. Select the table where data was just imported to collegerawdata, as shown in the following screenshot:

    Creating a new QuickSight dataset from MySQL

    Figure 2.17: MySQL state dimension

  5. Next, you have the option to prepare the data or straight import to SPICE and visualize it in QuickSight, as seen in the following screenshot:

    Creating a new QuickSight dataset from MySQL

    Figure 2.18: MySQL RDS data confirmation

This completes the loading of data from MySQL to QuickSight.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset