Joining tables is a very common requirement for data analysis and SPICE makes this effort a breeze. To understand this feature, let's go for a real use case. In the Chapter 2, Exploring Any Data, Loading data from Redshift to Quicksight section, we had uploaded USACensusSalarybyZip.csv
, which has salary and population information by USA zip code. Now let's upload another file that has mapping of zip codes to USA states and cities and then we will join this with the other dataset in SPICE.
First we need to load this new file to Redshift as a table. Here are the steps to load the data to Redshift:
USAZIPCODES
. You can clone the data pipeline created in the Chapter 2, Exploring Any Data, Creating and executing an AWS Data Pipeline section. The following is a screenshot of the new data pipeline:
Figure 3.4: AWS pipeline zip code
Figure 3.5: USA zip codes table description
You can download the SQL Workbench/J software from http://www.sql-workbench.net/downloads.html.
This completes the loading of new data to Redshift; next, we will review the steps to join the two tables and create a new dataset.
Now that we have two tables in the Redshift instance, we follow these steps to create a new dataset and load it to SPICE:
Figure 3.6: Preparing data after loading to Redshift
usasalarybyzip
and usazipcodes
tables as shown in the next figure.
Figure 3.7: Join using SPICE
Figure 3.8: Prepare and visualize the joined dataset
Figure 3.9: Visualization on joined data
This completes the joining dataset section; next, let's review other ways to enrich data in SPICE.