Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Joining data in SPICE

Joining tables is a very common requirement for data analysis and SPICE makes this effort a breeze. To understand this feature, let's go for a real use case. In the Chapter 2, Exploring Any Data, Loading data from Redshift to Quicksight section, we had uploaded USACensusSalarybyZip.csv, which has salary and population information by USA zip code. Now let's upload another file that has mapping of zip codes to USA states and cities and then we will join this with the other dataset in SPICE.

Loading data to Redshift

First we need to load this new file to Redshift as a table. Here are the steps to load the data to Redshift:

Download the data file from this GitHub location: https://github.com/rnadipalli/quicksight/blob/master/sampledata/USAZipCodes.csv.
Upload the file to your S3 bucket.
Create a new AWS data pipeline to upload S3 data to your Redshift cluster and call it USAZIPCODES. You can clone the data pipeline created in the Chapter 2, Exploring Any Data, Creating and executing an AWS Data Pipeline section. The following is a screenshot of the new data pipeline:

Figure 3.4: AWS pipeline zip code
For your reference, I have uploaded this data pipeline at https://github.com/rnadipalli/quicksight/blob/master/awsdatapipelines/LoadS3ZipcodetoRedshiftpipeline.json.
Once this data is uploaded to Redshift, you can use a database management tool like SQL Workbench/J to view and query the data.

Figure 3.5: USA zip codes table description

Note

You can download the SQL Workbench/J software from http://www.sql-workbench.net/downloads.html.

This completes the loading of new data to Redshift; next, we will review the steps to join the two tables and create a new dataset.

Creating a new joined dataset

Now that we have two tables in the Redshift instance, we follow these steps to create a new dataset and load it to SPICE:

From the QuickSight home page, click on Manage data.
Click on New data set and then select the Redshift option.
Next, enter the details of the Redshift data source and select the new table that was created, as shown in the following screenshot:

Figure 3.6: Preparing data after loading to Redshift
Now, in the data preparation page, select both the usasalarybyzip and usazipcodes tables as shown in the next figure.
Next, click on the ? sign between the two tables to configure the join criteria. You have the following options for join criteria:
- Inner: This gives only records that have matching values in the join columns from the two tables. This is the most commonly used and default option and is also shown in the next screenshot, where the join field is zip code.
- Left: This performs a left outer join, returning all records from the left-hand side table and only records that have a value in the join column for the right-hand side table.
- Right: This performs a right outer join, returning all records from the right-hand side table and only records that have a value in the join column for the left-hand side table.
- Outer: This performs a full outer join and returns records from both tables regardless of the values in the join columns.
Click on Apply to activate the join and see the results:

Figure 3.7: Join using SPICE
SPICE then shows the resultant combined dataset, as shown in the following screenshot. Next, click on Prepare data & visualize to visualize the joined dataset.

Figure 3.8: Prepare and visualize the joined dataset
Now, in QuickSight visualization, you can change the visualization type, dimensions, and measures and see an interesting stacked bar that shows which city has the best average salary in orange bars; the corresponding population is shown in blue bars. The following screenshot shows that Pound Ridge has an average salary of $219,554 and a population of 4,764:

Figure 3.9: Visualization on joined data

This completes the joining dataset section; next, let's review other ways to enrich data in SPICE.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Joining data in SPICE

Create new playlist

Sign In

Sign Up

Joining data in SPICE

Loading data to Redshift

Note

Creating a new joined dataset

Table of Contents for
Joining data in SPICE