Enriching data with mashups

Business intelligence is all about people. We need to help people to understand data as quickly as possible so that they make strategic decisions more quickly.

Unfortunately, decision makers can end up being distracted by a need to mash data together in Excel. Worse, they may even expend time in trying to understand the data in the first place, rather than using the time to analyze the data to make an informed decision. This has a pernicious impact on the organization since decision makers are diverted from their role and contribution to the organization.

One proposal is to take a step back and re-evaluate the business questions and how they are answered. Due to the requirement that the data is correct and current, users could be provisioned with data that is insulated from the operational systems and merged together. This could help to answer the business need for a decision to be made using the data on the dashboard.

In this recipe, we will load the fact table called FactInternetSales first, and then we will load the dimension tables. We will look at joining tables together and putting the tables into Tableau's memory. When we do this, we relieve some of the pressure that is produced when we create an unnecessary proliferation of snapshots of data. In turn, data silos need to be mashed up by business users in order to get the information that they need. We will join tables together so that the business users can get results more quickly using Tableau, rather than trying to merge lots of smaller spreadsheets together.

Getting ready

For the exercises in this chapter, open a new Tableau workbook and name it Chapter Four.

How to do it…

  1. On the initial Tableau page, select Connect to Data and navigate to the location where you stored the data files, as shown in the following screenshot:
    How to do it…
  2. Navigate to the FactInternetSales file and select it.
  3. Next, select the option to load multiple tables. You can see this in the next screenshot:
    How to do it…
  4. When we click on the Add Table… button for FactInternetSales, the following dialog box will appear:
    How to do it…
  5. You can see that the previous screenshot has two tabs: Table and Join.
  6. We will add the date dimension first. The connection to the table is called DimDate#csv, and you can see it in the previous screenshot.
  7. You don't need to take all of the columns. In fact, it would be a good idea to take across only the columns that you need, or it will be confusing for the end user to see too many unnecessary columns. Here, we have removed the international language columns for clarity.
  8. Now, let's click on the Join tab. We will join the tables together by the OrderDateKey column in the FactInternetSales table and the DateKey column in the DimDate table. The FactInternetSales table has more than one date: order date, ship date, and due date. The DimDate table has one key, which is called DateKey. A key is a database column that is used to establish relationships between tables in a database so that it does not have lots of redundant information. You can imagine how confusing it gets if a data store has the same column located all over the place in many different tables. So, we use keys to link tables together so that we can reuse the same information as much as possible. You can see the example in the following screenshot:
    How to do it…
  9. When you click on the Add button in the middle, a dialog box will appear as follows:
    How to do it…
  10. You can now click on OK, and you will be taken back to the Add Table dialog box.
  11. We will now add in other tables—all in the same way. Tableau attempts to work out the keys for you. In case you need a list, the following table is a summary:

    Table name

    Key

    DimProduct

    ProductKey

    DimProductSubCategory

    ProductSubCategoryKey

    DimProductCategory

    ProductCategoryKey

    DimSalesTerritory

    SalesTerritoryKey

  12. For each additional table, we will add an inner join. What are these joins? An inner join, also known as an equi join, selects only the rows from both the tables that have matching values. Rows with values in the joined field that do not appear in both of the database tables will be excluded from the result set. So, for example, if there is a row in the FactInternetSales table which does not have a value in the DimDate table, then it will not be returned, and vice versa.
  13. With a left join, the selected rows will include all of the records in the first database table. In this case, it will return all of the rows in the FactInternetSales table whether there is a match in the DimDate table or not.

    Note

    One or more fields can serve as the join fields. In this simple example, we have selected only one field for clarity.

  14. Once you have added all of the tables, your dialog box will appear as you can see in the following screenshot:
    How to do it…
  15. Once you have clicked on OK, you can import all the data or just some of it. We will import all of the data into Tableau's data engine. A list of the options is shown in the following screenshot:
    How to do it…
  16. Once you have imported your data, your Tableau worksheet will show all of the tables on the left-hand side. You can see the next screenshot as an example:
    How to do it…

    You can see that the tables are located in the Data pane on the left-hand side.

How it works…

To summarize, in this section, we have shown different ways of joining tables in order to alleviate the situation where people are copying and pasting data all over the place. By unifying all the data that the users need into one distinct place, it will save them time and energy that they could use to make better decisions.

Tableau has its own data engine, which is an analytics database. It uses compression, which means that it can store a lot of data. It also involves techniques to make data retrieval very fast. It is a flexible data model, and you can work on the data very quickly in the same way as it is represented on the disk.

We saved the data extract, which is a separate file from the Tableau workbook. It has the file extension .tde, which stands for Tableau Data Extract. This extract file can be reused directly without having the source connection details. This is extremely useful for portability. You can also use it as a way to prototype the dashboard. Further, you can anonymize a dataset and load it into an extract file. Then, you can ask the developer to work from the anonymized extract file. Once the development is complete, you could change the source connection so that Tableau connects to real-world data.

There's more…

Here is a trick: if you want to verify the number of rows loaded into Tableau's data engine, then take the Number of Records metric and put it into the Marks shelf. This will give you a quick check to see if the number of rows loaded matches what you expect. If the number is much lower or higher, then one of your joins may be wrong. It's best to do this before you start or you will have to redo the work!

What happens if you have made a mistake and want to delete worksheets? You can delete worksheets by right-clicking on the worksheet tab and selecting Delete Sheet.

When you attempt to delete a worksheet, you will notice that there is no option to delete the sheet. You can see an illustration of this feature in the following screenshot:

There's more…

If the worksheet is used in a dashboard, Tableau will not allow you to delete a worksheet if it is reused elsewhere. In order to get around this feature, you need to delete the dashboard first, and then you'd need to delete the original worksheet. Tableau doesn't let you delete all of the worksheets, however, since it needs a worksheet in order to show data!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset