Achieving flexibility with data blends

Data blending, which we examined in detail in Chapter 2, Working with Data in Tableau, is a powerful feature in which Tableau can combine two different data sources in a single view. In order for a blend to work, Tableau requires at least one dimension in each data set that can be linked together (either automatically, when the name and data type matches, or manually, when you define a relationship via the menu Data | Edit Relationships). Sometimes, you may not have the required dimensions in one of the datasets, or the values may not match. You can use calculated fields as dimensions for blending.

In addition to the renter data we've been using, let's say we have another data source that gives us the discount given to some renters based on the apartment they rented. The discount indicates the percentage by which the rental price is reduced. Ultimately, the goal is to determine the actual price paid by all renters, including the discount if applicable:

Apartment Discount
A1 .05
A2 .02
B2 .01

 

This dataset is part of the Apartment Rentals.xlsx spreadsheet, and the connection is named Discount and is included in the chapter 04 workbook.

The Apartment Rentals and Discount data sources both contain a field called Apartment, but the original dataset contained dashes in the value, while this new dataset does not. How can we blend these sources together?

As you'll recall, whichever data source you start building a view with is the primary. Using Apartment Rental as the primary data source and Discount as the secondary, this is what happens when we accept the default blending on the Renter field:

Both Apartment and Discount from the secondary source are NULL. This is because the values of Apartment never exactly match.

Data blending occurs on the aliases of fields, so one option is to simply change the aliases for one of the fields to match the values in the other. For example, if you were to use the drop-down menu of the Apartment field in the primary source, and select Aliases..., you could change all the aliases to remove the dash. At that point, the data blend works.

However, if any new renters are added to the data, you'll need to edit the aliases again. That could become tedious, or even be un-maintainable, for a large, constantly changing data source.

Let's consider solving the data blending issue using a calculated field. Create a calculation in the primary data source called Apartment (Blend) with the following code:

REPLACE([Apartment], "-", "") 

This code replaces the dashes in Apartment with blanks. The value of this new calculated field will now exactly match the value in the secondary source.

Now we need to tell Tableau to blend on the new calculated field Renter Last Name. Manually edit the data relationships (from the menu select Data | Edit Relationships) and manually match the Apartment (blend) field from the primary data source to the Apartment field in the secondary. At this point, the blend works and we can create a view similar to this (Discount has been given a default number format of percentage):

Note that Discount is still blank in several cases. This is because the secondary data source did not have records that matched values from the primary data source. In these cases, the value from the secondary source will be NULL.

We started out with a goal of determining the actual rental price, including any discount, for all renters. We're nearly there. What we'll need to do is create another calculation that takes the initial price and multiplies it by 1 minus the discount. That is, if the discount is .05 (5%), we'll multiply the price by .95 (95%) to get the actual price.

In order to calculate the actual rental price, we might start with a calculation in the primary source, as shown:

Note that we can reference fields from the secondary source in calculations. Also, you must reference them as aggregates. In this case, we used SUM, because we knew we'd keep our view at the level of individual apartments and so the SUM of the discount would be correct. However, Tableau indicates that the calculation has an error.

The drop down menu gives the detail,Cannot mix aggregate and non-aggregate arguments with this function. This is a common error that indicates that we've tried to mix Row Level calculations with Aggregate Level calculations. To fix this, we need to identify which elements of the calculation are row-level, which are aggregate, and what function is trying to use both. In this case, we are trying to multiply the row-level field Price by the aggregate-level field Discount. We know Discount has to be aggregate, since it is from a secondary data source. So, we fix the problem by making Price an aggregate.

Our new, valid function looks similar to the following screenshot:

When you find and double-click on a secondary field in the fields list to add it to the code, Tableau will automatically insert it with a default aggregation.

However, even though we have an indication of valid syntax, our view indicates that something is still wrong:

We wanted to get the actual price for all apartments, not just the ones that had a match in the secondary data source. The reason we're not is because the calculation returns a NULL when Discount is NULL because there is no match in the blend. To resolve this, we'll need to adjust the calculation. Edit the code of Actual Price to the following:

SUM([Price]) * (1 - ZN(SUM([Discount].[Discount]))) 

Here, we've wrapped the aggregation of the secondary Discount field in ZN(). This function (which stands for Zero if Null) evaluates the expression inside the parentheses and returns a 0 if the expression is NULL, or simply the expression if it's not NULL. So, in this case, any NULL value for the sum of Discount is converted to 0 and we get the sum of price multiplied by one, which means we get the original price with no discount:

The final result is just what we wanted!

With Tableau 10.0, you have the ability to join together different data sources, which gives you some options to consider. The preceding example could might have been solved using a join because both data sources had one record per apartment. But imagine a data source that had multiple records for a single apartment. Joining would give you the discount value repeated for each row. Blending in such a case might be a more straightforward solution. A good understanding of data blending gives you additional options for working with your data.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset