Dealing with multiple fact tables in one model

In data models designed around business processes, we will often have just one source fact table. If we have additional fact tables, they tend to be at a similar grain to the main fact table, which is easier to deal with. Line-of-business documents may have fact tables from lots of different sources that are not at the same grain level at all, but we are still asked to deal with creating the associations. There are, of course, several methods to deal with this scenario.

Joining the fact tables together

If the fact tables have an identical grain, with the exact same set of primary keys, then it is valid to join, using a full outer join, the two tables together. Consider the following example:

Fact:
Load * Inline [
Date, Store, Product, Sales Value
2014-01-01, 1, 1, 100
2014-01-01, 2, 1, 99
2014-01-01, 1, 2, 111
2014-01-01, 2, 2, 97
2014-01-02, 1, 1, 101
2014-01-02, 2, 1, 98
2014-01-02, 1, 2, 112
2014-01-02, 2, 2, 95
];

Join (Fact)
Load * Inline [
Date, Store, Product, Waste Value
2014-01-01, 1, 1, 20
2014-01-01, 2, 1, 10
2014-01-02, 2, 2, 11
2014-01-03, 2, 1, 5
];

This will produce a table that looks like the following:

Joining the fact tables together

We know from our previous discussion about null values in fact tables that QlikView will perfectly handle these values for all calculations.

Concatenating fact tables

Concatenation of tables instead of joining them is often a go-to strategy for the creation of combined fact tables. It works well because logically we end up with the same result as joining. Also, if there is any suspicion that there are duplicate keys (so, in our example, two or more rows for the same date, Store and Product—which may be valid), then concatenation will still work where a join will not. In the previous example, if we were to concatenate rather than join, then the table would look like the following:

Concatenating fact tables

One thing that we need to consider is that this table is longer than the previous one while still being as wide. Therefore, it will take up more space in memory.

It can also work to concatenate fact tables that have a different grain. In that case, it is a good idea to populate the key values that are missing with a key value pointing to the "not applicable" value in the dimension, as we discussed earlier.

Changing the grain of a fact table

We mentioned previously that we can reduce the granularity of a fact table by aggregating the facts to a smaller subset of dimensions—for example, removing transaction time and aggregating to transaction date. There may be other occasions, and good business reasons, where you have a fact table at one grain and want to make it more granular to match with another fact table. For example, suppose that I have sales data by date and have budget data by week; I may want to split the budget down to the day level to give me more granularity in my day-by-day analysis.

Imagine a scenario where we are going to load the weekly budget but we want to apportion that over the days in a different ratio—to reflect general trading conditions. The percentages that we want per day are as follows:

Day

Percentage

Monday

10%

Tuesday

13%

Wednesday

15%

Thursday

17%

Friday

20%

Saturday

25%

Sunday

0%

We can load a mapping table with this information and then use that to calculate the correct daily value:

Budget_Day_Percent:
Mapping Load * Inline [
Day, Percentage
0, .10
1, .13
2, .15
3, .17
4, .20
5, .25
6, 0
];

Budget:
Load
   YearWeek,
   Store,
   Product,
   [Budget Value] As WeekBudget
From Budget.qvd (QVD);

Left Join (Budget)
Load
   YearWeek,
   Date 
From Calendar.qvd (QVD);
Left Join (Budget)
Load
   Date,
   Store,
   Product,
   WeekBudget 
      * ApplyMap('Budget_Day_Percent', WeekDay(Date), 0)
      As [Budget Value]
Resident
   Budget;

Drop Field WeekBudget;

Linking fact tables of different grains

If the fact tables have different grains, especially where they have quite different dimension keys, only sharing a few, it often doesn't make sense to concatenate them—we just create a wide and long fact table that has many null values. In that case, it makes more sense to create a link table to associate the two tables.

A link table is pretty much exactly like a synthetic key table, except that we are controlling the creation of composite keys. There are a couple of simple rules for the creation of link tables:

  • Create a key in each fact table that will associate the rows in the fact table to the link table. This will mostly be a combination of the keys that we are going to use in the link table using AutoNumber.
  • Use a mixture of concatenation and joins to create the link table.
  • Drop the key fields that have been added to the link table from the fact tables.

I did once have a different approach to this, using primary keys for each fact table, but the preceding approach is far simpler.

Let's look at an example. We will return to retail sales and budgets, but this time we will have very different grains that are not easily changeable. We will have a date, store, and product, but the sales information will be down to till, operator, and time. There is very little chance of us manipulating the budget data down to this level.

Now, it is valid to concatenate these tables as we discussed earlier. Once you have used both techniques a number of times, you will be able to make a good judgment of which one to use on a case-by-case basis. Most often, the overriding consideration should be memory size and lower memory equals lower cache and better performance for more users.

The following is the example load:

Sales:
LOAD 
   AutoNumber(Floor(Date) & '-' & Store & '-' & Product, 'sB_Link') As SB_Link,
   * 
INLINE [
  Date, Store, Product, Till, Operator, Time, Sales Quantity, Sales Value
  2014-01-01, 1, 1, 1, 1, 09:00:00, 1, 12.12
  2014-01-01, 1, 2, 1, 1, 09:01:30, 2, 3.33
  2014-01-01, 2, 1, 3, 5, 10:11:01, 4, 17.88
  2014-01-01, 2, 2, 5, 5, 12:02:22, 1, 1.70
];


Budget:
LOAD 
   AutoNumber(Floor(Date) & '-' & Store & '-' & Product, 'sB_Link') As SB_Link,
   * 
INLINE [
  Date, Store, Product, Budget Value
  2014-01-01, 1, 1, 20.00
  2014-01-01, 1, 2, 3.00
  2014-01-01, 2, 1, 20.00
  2014-01-01, 2, 2, 3.00
];

Link_Table:
Load Distinct
   SB_Link,
   Date,
   Store,
   Product
Resident
   Sales;

Join (Link_Table)
Load Distinct
   SB_Link,
   Date,
   Store,
   Product
Resident
   Budget;

Drop Fields Date, Store, Product From Sales;
Drop Fields Date, Store, Product From Budget;

This will produce a model like the following:

Linking fact tables of different grains

In this case, we happen to have all the fields that we are using in the link table in both tables. What happens if we have a different table in the mix that only has two of those fields? For example, if we have a table containing the current stock levels for each product by store, we can add this to the link table in the following manner:

Store_Stock:
Load
   AutoNumber(Store & '-' & Product, 'sS_Link') As SS_Link, 
   * 
Inline [
Store, Product, Stock Level
1, 1, 12.00
1, 2, 2.00
2, 1, 6.00
2, 2, 2.00
];

Join (Link_Table)
Load Distinct
   SS_Link,
   Store,
   Product
Resident
   Store_Stock;

Drop Fields Store, Product from Store_Stock;

The data model will now look like the following:

Linking fact tables of different grains

We can continue to add keys to this table like this—either joining or concatenating. We can also, if necessary, build two or three link tables and then concatenate them together at the end.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset