In data models designed around business processes, we will often have just one source fact table. If we have additional fact tables, they tend to be at a similar grain to the main fact table, which is easier to deal with. Line-of-business documents may have fact tables from lots of different sources that are not at the same grain level at all, but we are still asked to deal with creating the associations. There are, of course, several methods to deal with this scenario.
If the fact tables have an identical grain, with the exact same set of primary keys, then it is valid to join, using a full outer join, the two tables together. Consider the following example:
Fact: Load * Inline [ Date, Store, Product, Sales Value 2014-01-01, 1, 1, 100 2014-01-01, 2, 1, 99 2014-01-01, 1, 2, 111 2014-01-01, 2, 2, 97 2014-01-02, 1, 1, 101 2014-01-02, 2, 1, 98 2014-01-02, 1, 2, 112 2014-01-02, 2, 2, 95 ]; Join (Fact) Load * Inline [ Date, Store, Product, Waste Value 2014-01-01, 1, 1, 20 2014-01-01, 2, 1, 10 2014-01-02, 2, 2, 11 2014-01-03, 2, 1, 5 ];
This will produce a table that looks like the following:
We know from our previous discussion about null values in fact tables that QlikView will perfectly handle these values for all calculations.
Concatenation of tables instead of joining them is often a go-to strategy for the creation of combined fact tables. It works well because logically we end up with the same result as joining. Also, if there is any suspicion that there are duplicate keys (so, in our example, two or more rows for the same date, Store
and Product
—which may be valid), then concatenation will still work where a join will not. In the previous example, if we were to concatenate rather than join, then the table would look like the following:
One thing that we need to consider is that this table is longer than the previous one while still being as wide. Therefore, it will take up more space in memory.
It can also work to concatenate fact tables that have a different grain. In that case, it is a good idea to populate the key values that are missing with a key value pointing to the "not applicable" value in the dimension, as we discussed earlier.
We mentioned previously that we can reduce the granularity of a fact table by aggregating the facts to a smaller subset of dimensions—for example, removing transaction time and aggregating to transaction date. There may be other occasions, and good business reasons, where you have a fact table at one grain and want to make it more granular to match with another fact table. For example, suppose that I have sales data by date and have budget data by week; I may want to split the budget down to the day level to give me more granularity in my day-by-day analysis.
Imagine a scenario where we are going to load the weekly budget but we want to apportion that over the days in a different ratio—to reflect general trading conditions. The percentages that we want per day are as follows:
Day |
Percentage |
---|---|
Monday |
10% |
Tuesday |
13% |
Wednesday |
15% |
Thursday |
17% |
Friday |
20% |
Saturday |
25% |
Sunday |
0% |
We can load a mapping table with this information and then use that to calculate the correct daily value:
Budget_Day_Percent: Mapping Load * Inline [ Day, Percentage 0, .10 1, .13 2, .15 3, .17 4, .20 5, .25 6, 0 ]; Budget: Load YearWeek, Store, Product, [Budget Value] As WeekBudget From Budget.qvd (QVD); Left Join (Budget) Load YearWeek, Date From Calendar.qvd (QVD); Left Join (Budget) Load Date, Store, Product, WeekBudget * ApplyMap('Budget_Day_Percent', WeekDay(Date), 0) As [Budget Value] Resident Budget; Drop Field WeekBudget;
If the fact tables have different grains, especially where they have quite different dimension keys, only sharing a few, it often doesn't make sense to concatenate them—we just create a wide and long fact table that has many null values. In that case, it makes more sense to create a link table to associate the two tables.
A link table is pretty much exactly like a synthetic key table, except that we are controlling the creation of composite keys. There are a couple of simple rules for the creation of link tables:
AutoNumber
.I did once have a different approach to this, using primary keys for each fact table, but the preceding approach is far simpler.
Let's look at an example. We will return to retail sales and budgets, but this time we will have very different grains that are not easily changeable. We will have a date, store, and product, but the sales information will be down to till, operator, and time. There is very little chance of us manipulating the budget data down to this level.
Now, it is valid to concatenate these tables as we discussed earlier. Once you have used both techniques a number of times, you will be able to make a good judgment of which one to use on a case-by-case basis. Most often, the overriding consideration should be memory size and lower memory equals lower cache and better performance for more users.
The following is the example load:
Sales: LOAD AutoNumber(Floor(Date) & '-' & Store & '-' & Product, 'sB_Link') As SB_Link, * INLINE [ Date, Store, Product, Till, Operator, Time, Sales Quantity, Sales Value 2014-01-01, 1, 1, 1, 1, 09:00:00, 1, 12.12 2014-01-01, 1, 2, 1, 1, 09:01:30, 2, 3.33 2014-01-01, 2, 1, 3, 5, 10:11:01, 4, 17.88 2014-01-01, 2, 2, 5, 5, 12:02:22, 1, 1.70 ]; Budget: LOAD AutoNumber(Floor(Date) & '-' & Store & '-' & Product, 'sB_Link') As SB_Link, * INLINE [ Date, Store, Product, Budget Value 2014-01-01, 1, 1, 20.00 2014-01-01, 1, 2, 3.00 2014-01-01, 2, 1, 20.00 2014-01-01, 2, 2, 3.00 ]; Link_Table: Load Distinct SB_Link, Date, Store, Product Resident Sales; Join (Link_Table) Load Distinct SB_Link, Date, Store, Product Resident Budget; Drop Fields Date, Store, Product From Sales; Drop Fields Date, Store, Product From Budget;
This will produce a model like the following:
In this case, we happen to have all the fields that we are using in the link table in both tables. What happens if we have a different table in the mix that only has two of those fields? For example, if we have a table containing the current stock levels for each product by store, we can add this to the link table in the following manner:
Store_Stock: Load AutoNumber(Store & '-' & Product, 'sS_Link') As SS_Link, * Inline [ Store, Product, Stock Level 1, 1, 12.00 1, 2, 2.00 2, 1, 6.00 2, 2, 2.00 ]; Join (Link_Table) Load Distinct SS_Link, Store, Product Resident Store_Stock; Drop Fields Store, Product from Store_Stock;
The data model will now look like the following:
We can continue to add keys to this table like this—either joining or concatenating. We can also, if necessary, build two or three link tables and then concatenate them together at the end.