After reviewing the theory of dimensional data modeling, it is now time to apply those concepts to how QlikView works. By now, we know that a QlikView document is, in general terms, used to visualize data. This data can come from disparate sources: a database, Excel files, a legacy system, or even the Web. But how do we put all this data together? The answer is: through the associative data model.
The associative data model can be said to be equal in structure to a dimensional model. However, a data model of any type in QlikView becomes an associative data model because it not only contains the different source tables from which the charts get the data, but also keeps them associated in a way which allows the QlikView document and its users to consume information and aggregate data cross dimensionally in any possible way. In a QlikView data model, all of the field values from all of the tables in the model are automatically associated among themselves based purely on the field names.
Let's look at the Airline Operations data model built in an earlier example, from Chapter 4, Data Sources:
In the previous screenshot, we can see how the fact table (that is, the Main Data table) is directly associated with all of the other dimension tables. The purpose of these dimension tables is to provide context to the values stored in the fact table. Furthermore, the dimension tables are not only associated with the fact table, but at the same time they are indirectly associated with each other through the fact table.
With the data model shown we can, for instance, cross-reference the Origin Airport with the Destination Airport (via the Main Data table) and get the Distance Interval value between any two of them. These three fields are stored in three different dimension tables in the data model, and the fact that they are associated allows QlikView to naturally perform this cross-dimensional reference and support the associative analysis we just described. This is shown in the following screenshot:
In an associative data model, any field can act as a dimension in a chart. They can all be used within expressions to aggregate their data too.
In order to design and build a data model in QlikView, we need to understand how the associations between tables are created. We also need to consider some basic rules to avoid performance and data consistency issues. In this section, we will describe and review these guidelines.
QlikView creates associations between tables in a simple and straightforward manner: through the field names. This means that, for any given set of tables, an association is automatically created between two of them if they both contain a field with exactly the same name. Simple enough.
Based on this concept, we can say that QlikView will automatically build up the data model with its respective associations even if the developer does not explicitly define how the tables are linked to each other. However, this functionality requires that the source tables contain the correct field names. Since this scenario seldom occurs, especially if we are loading tables from several different source systems, the most basic and fundamental tool for the data model design in QlikView is Renaming fields.
There are two main reasons for a developer to rename a field:
To rename a field, we can simply use the as
keyword in the
Load
script to assign an alias to the original field name. For example, take the following Load
script, in which we hypothetically load the Airport descriptions
:
[Origin Airports]: LOAD Code as [Origin Airport ID], Description as [Origin Airport] FROM [..Data FilesQVDsAirport Descriptions.qvd] (qvd);
What this code does is load the table contained in the Airport Descriptions.qvd
file. This table has two fields: Code
and Description
. In this case, we are changing the original names from Code
to Origin Airport ID
and from Description
to Origin Airport
.
This way, we are ensuring an association between the Origin Airports
table and any other table containing either a field named Origin Airport ID
or one named Origin Airport
. At the same time, we are ensuring that the table doesn't associate with other tables which contain fields named Code
or Description
.
The Qualify
keyword can be used to qualify field names with
their corresponding table name, which basically renames the specified fields in the form of tablename.fieldname
, thus ensuring no unwanted associations are created.
Let's look at our previous example in which we needed to rename the Code
and Description
fields. We can rename these fields by using the Qualify
keyword as follows:
Qualify Code, Description; [Origin Airports]: LOAD Code, Description FROM [..Data FilesQVDsAirport Descriptions.qvd] (qvd);
The above Load
statement will result in a table with two fields: Origin Airports.Code
and Origin Airports.Description
.
As you can see, we have specifically entered, as the
Qualify
statement parameter, the two fields which we want to rename. In some cases, we might need to rename a long list of fields, making it impractical to list them all in a Qualify
statement. Luckily, the Qualify
statement allows the use of wildcard characters in the fieldlist
parameter. For example, we can use a star symbol to specify that all subsequently loaded fields should be qualified. We can also combine the star symbol with a string or with a question mark symbol (another wildcard character) to specify
that a set of fields that match a given criteria are to be qualified. For instance:
Qualify Code, Description;
: This command will only qualify fields named Code
or Description
Qualify "*ID";
: This command will qualify all fields whose name ends with ID
Qualify *;
: This command will qualify all fields that are loaded from that point forwardSometimes, the Qualify
feature is required to be activated for only part of the script but then should be turned off after loading certain tables. To do that, we can simply use the Unqualify
statement, specifying the field names in the same way as described previously.
In some cases, we are required to turn on qualification for all fields except one or two (for example, the key
fields). To accomplish that, the Qualify
and Unqualify
instructions can be used in conjunction, like in the following example:
Qualify *; Unqualify Key_Field1, Key_Field2;
The above combination of instructions will cause all fields loaded from that point forward to be qualified with their corresponding table name, except the fields named Key_Field1
and Key_Field2
. This is useful when we want to ensure key fields are not affected by the Qualify
instruction.
With the simplicity QlikView provides in building the associative data model, it's very likely we will sometimes find one of the following two issues:
Both of these issues need to be avoided since they can cause performance degradation in the QlikView application, along with data inconsistency.
When any two tables share more than one common field, QlikView creates a complex key, or synthetic key, to try and associate both tables through the combination of all of the common fields between them. This takes the form of an additional table containing the shared fields and an additional key field added to all involved tables.
An example of a data model with synthetic keys is presented in the following screenshot:
As you can see from the earlier screenshot, the data model is primarily composed of two tables: Main Data and Calendar. These two tables have two fields in common: Year and Month.
Because of QlikView's associative engine, the two tables are automatically linked through both fields, creating a complex key out of the combination of their values.
There is also a third table in our data model, called $Syn 1 Table. This is the synthetic table which stores the combination of values for the two fields which, as pointed out, form the synthetic key.
The presence of synthetic keys in a data model can cause the application to have slow response time and sometimes even consume all available resources. Therefore, they need to be avoided when possible.
There are several methods we can use to remove synthetic keys:
Load
script.The following flowchart shows the decision process a developer should follow to decide which of the methods mentioned earlier should be used:
Getting back to our synthetic keys example shown earlier, let's see which of these workarounds would best solve our problem. If we follow the flow chart, we can arrive at the following conclusions:
Therefore, we should use the third method and create a complex key using the Month and the Year fields. At the same time, we will need to remove the individual fields from one of the tables.
We decided the unique field was composed of both fields because if we had used only one of them, the key would not be unique. That is:
A relation between a fact table and a dimension table should always be at the same granularity.
While we are at it, let's see how we are going to create the composite key needed in our simulated scenario to solve the synthetic key issue.
First, you should be familiar with the values that exist in each of the fields at play. Let's assume the following:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
2010, 2011, 2012
Complex keys can be created from the Load
script. The following script will create the corrected data model by loading both tables, creating the complex key in both tables, and removing the conflicting fields from the Main Data table while keeping them on the Calendar table:
[Main Data]: Load Year & '|' & Month as YearMonth, Total From FactTable.qvd (qvd); Calendar: Load Year & '|' & Month as YearMonth, Month, Year, Quarter, [Month Name] From Calendar.qvd (qvd);
We are using the ampersand operator to merge the values from the two fields into one. We then assign an alias to the new calculated field by using the as
keyword.
The resulting data model, created using compound keys, is shown in the following screenshot:
The synthetic key has been successfully eliminated from the data model and the associations between both tables have been explicitly defined.
Similar to how synthetic keys are created, a circular reference can also be the result of unwanted associations in our data model and, as such, they can be fixed using the same principles described earlier. But before getting into how to solve them, let's first see what they are.
We can think of the data model as a map that shows the paths through which we can walk to get from one point to another. For instance, considering the data model seen in the previous section, the only path to get from the fact table to the Calendar table is the YearMonth route. This means in order to get there you must know what Month and Year you are looking for in the Calendar table.
However, when the data model becomes more and more complex, with a larger number of tables, and more and more destination points, we might also get to a point where we have more than one route connecting point A to point B. All roads lead to Rome, they say. Well, in our case, we must always have one road between any two points. Otherwise, we would be having a circular reference.
To better understand what a circular reference is, let's look at the following data model:
As you can see, a circular reference exists when the connections among the tables simulate a circle and we have two routes to get from any given point to another. For example, we can get from the Main Data table to the Calendar table either directly, through the YearMonth route, or by going first to the Promotions table and then moving to the Calendar table.
This is an issue that needs to be addressed, and one that can create severe data inconsistency problems, performance degradation, and even crashes.
To solve the presented scenario, and based on how we deal with synthetic keys in the flow chart, we should start by asking which of the created associations are correct and which aren't. In this case, the association between the Promotions table and the Calendar table is incorrect since the Month field stored in the former table does not necessarily represent the month in which the promotion was used, but rather the month in which the promotion was created. They just happen to have the same name.
Most of the time, as in the earlier example, we will find that the problem arises from unwanted associations and the issue is easy to solve (through the first method from the section on synthetic keys). Other times, there are design challenges that need to be analyzed thoroughly.
In Chapter 9, Data modeling best practices, there is a full section dedicated to address the main design challenges a developer could come across when designing a data model.
Now, let's quickly describe how the Table Viewer window works and how it can become our best ally when data modeling.