The associative data model

After reviewing the theory of dimensional data modeling, it is now time to apply those concepts to how QlikView works. By now, we know that a QlikView document is, in general terms, used to visualize data. This data can come from disparate sources: a database, Excel files, a legacy system, or even the Web. But how do we put all this data together? The answer is: through the associative data model.

The associative data model can be said to be equal in structure to a dimensional model. However, a data model of any type in QlikView becomes an associative data model because it not only contains the different source tables from which the charts get the data, but also keeps them associated in a way which allows the QlikView document and its users to consume information and aggregate data cross dimensionally in any possible way. In a QlikView data model, all of the field values from all of the tables in the model are automatically associated among themselves based purely on the field names.

Let's look at the Airline Operations data model built in an earlier example, from Chapter 4, Data Sources:

The associative data model

In the previous screenshot, we can see how the fact table (that is, the Main Data table) is directly associated with all of the other dimension tables. The purpose of these dimension tables is to provide context to the values stored in the fact table. Furthermore, the dimension tables are not only associated with the fact table, but at the same time they are indirectly associated with each other through the fact table.

With the data model shown we can, for instance, cross-reference the Origin Airport with the Destination Airport (via the Main Data table) and get the Distance Interval value between any two of them. These three fields are stored in three different dimension tables in the data model, and the fact that they are associated allows QlikView to naturally perform this cross-dimensional reference and support the associative analysis we just described. This is shown in the following screenshot:

The associative data model

In an associative data model, any field can act as a dimension in a chart. They can all be used within expressions to aggregate their data too.

Guidelines for table associations

In order to design and build a data model in QlikView, we need to understand how the associations between tables are created. We also need to consider some basic rules to avoid performance and data consistency issues. In this section, we will describe and review these guidelines.

How associations are created

QlikView creates associations between tables in a simple and straightforward manner: through the field names. This means that, for any given set of tables, an association is automatically created between two of them if they both contain a field with exactly the same name. Simple enough.

Note

In QlikView, field names are case-sensitive.

Based on this concept, we can say that QlikView will automatically build up the data model with its respective associations even if the developer does not explicitly define how the tables are linked to each other. However, this functionality requires that the source tables contain the correct field names. Since this scenario seldom occurs, especially if we are loading tables from several different source systems, the most basic and fundamental tool for the data model design in QlikView is Renaming fields.

Renaming fields

There are two main reasons for a developer to rename a field:

  • To ensure that two tables are associated through the correct fields when originally these two tables did not share a field with the same name, but a link does in fact exist between them.
  • To prevent unwanted associations between tables when they share a field with the same name but that field does not actually represent the link between them.

To rename a field, we can simply use the as keyword in the Load script to assign an alias to the original field name. For example, take the following Load script, in which we hypothetically load the Airport descriptions:

[Origin Airports]:
LOAD
   Code             as [Origin Airport ID], 
   Description    as [Origin Airport]
FROM
[..Data FilesQVDsAirport Descriptions.qvd]
(qvd);

What this code does is load the table contained in the Airport Descriptions.qvd file. This table has two fields: Code and Description. In this case, we are changing the original names from Code to Origin Airport ID and from Description to Origin Airport.

This way, we are ensuring an association between the Origin Airports table and any other table containing either a field named Origin Airport ID or one named Origin Airport. At the same time, we are ensuring that the table doesn't associate with other tables which contain fields named Code or Description.

Renaming fields with the Qualify statement

The Qualify keyword can be used to qualify field names with their corresponding table name, which basically renames the specified fields in the form of tablename.fieldname, thus ensuring no unwanted associations are created.

Let's look at our previous example in which we needed to rename the Code and Description fields. We can rename these fields by using the Qualify keyword as follows:

Qualify Code, Description;

[Origin Airports]:
LOAD
   Code, 
   Description
FROM
[..Data FilesQVDsAirport Descriptions.qvd]
(qvd);

The above Load statement will result in a table with two fields: Origin Airports.Code and Origin Airports.Description.

As you can see, we have specifically entered, as the Qualify statement parameter, the two fields which we want to rename. In some cases, we might need to rename a long list of fields, making it impractical to list them all in a Qualify statement. Luckily, the Qualify statement allows the use of wildcard characters in the fieldlist parameter. For example, we can use a star symbol to specify that all subsequently loaded fields should be qualified. We can also combine the star symbol with a string or with a question mark symbol (another wildcard character) to specify that a set of fields that match a given criteria are to be qualified. For instance:

  • Qualify Code, Description;: This command will only qualify fields named Code or Description
  • Qualify "*ID";: This command will qualify all fields whose name ends with ID
  • Qualify *;: This command will qualify all fields that are loaded from that point forward

    Sometimes, the Qualify feature is required to be activated for only part of the script but then should be turned off after loading certain tables. To do that, we can simply use the Unqualify statement, specifying the field names in the same way as described previously.

    In some cases, we are required to turn on qualification for all fields except one or two (for example, the key fields). To accomplish that, the Qualify and Unqualify instructions can be used in conjunction, like in the following example:

    Qualify *;
    Unqualify Key_Field1, Key_Field2;

    The above combination of instructions will cause all fields loaded from that point forward to be qualified with their corresponding table name, except the fields named Key_Field1 and Key_Field2. This is useful when we want to ensure key fields are not affected by the Qualify instruction.

    Note

    Remember that a Qualify instruction can be turned off at any point in the script with a corresponding Unqualify statement.

Avoiding data model conflicts

With the simplicity QlikView provides in building the associative data model, it's very likely we will sometimes find one of the following two issues:

  • The creation of what is called "Synthetic Keys" (described in the next section)
  • The creation of circular references in the data model

Both of these issues need to be avoided since they can cause performance degradation in the QlikView application, along with data inconsistency.

Dealing with synthetic keys

When any two tables share more than one common field, QlikView creates a complex key, or synthetic key, to try and associate both tables through the combination of all of the common fields between them. This takes the form of an additional table containing the shared fields and an additional key field added to all involved tables.

An example of a data model with synthetic keys is presented in the following screenshot:

Dealing with synthetic keys

As you can see from the earlier screenshot, the data model is primarily composed of two tables: Main Data and Calendar. These two tables have two fields in common: Year and Month.

Because of QlikView's associative engine, the two tables are automatically linked through both fields, creating a complex key out of the combination of their values.

There is also a third table in our data model, called $Syn 1 Table. This is the synthetic table which stores the combination of values for the two fields which, as pointed out, form the synthetic key.

The presence of synthetic keys in a data model can cause the application to have slow response time and sometimes even consume all available resources. Therefore, they need to be avoided when possible.

There are several methods we can use to remove synthetic keys:

  • We can rename those fields that are a part of the synthetic key but should not be a part of the association between the two tables.
  • We can remove conflicting fields from one of the two tables. To remove a field, we just erase the corresponding line of code from the Load script.
  • We can create an explicit complex key with the concatenation of all common fields that actually represent the link between the two tables.
    • After creating the new complex key, we can remove the conflicting fields from either table.

The following flowchart shows the decision process a developer should follow to decide which of the methods mentioned earlier should be used:

Dealing with synthetic keys

Dealing with Synthetic Keys – Flow Chart.

Getting back to our synthetic keys example shown earlier, let's see which of these workarounds would best solve our problem. If we follow the flow chart, we can arrive at the following conclusions:

  • Yes, the tables should be associated with each other
  • The unique key is formed by two fields
  • The fields that constitute the unique key are Year and Month

Therefore, we should use the third method and create a complex key using the Month and the Year fields. At the same time, we will need to remove the individual fields from one of the tables.

We decided the unique field was composed of both fields because if we had used only one of them, the key would not be unique. That is:

  • If we had used the field Year as our key, one record in the fact table would be associated to 12 records in the Calendar table, since there are 12 months corresponding to one year.
  • Likewise, if we had used the field Month as our key, one record in the fact table would be associated to as many records as the number of years exist in the Calendar table.

A relation between a fact table and a dimension table should always be at the same granularity.

Creating a composite key

While we are at it, let's see how we are going to create the composite key needed in our simulated scenario to solve the synthetic key issue.

First, you should be familiar with the values that exist in each of the fields at play. Let's assume the following:

  • The Month field has the following values:

    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

  • The Year field has the following values:

    2010, 2011, 2012

Complex keys can be created from the Load script. The following script will create the corrected data model by loading both tables, creating the complex key in both tables, and removing the conflicting fields from the Main Data table while keeping them on the Calendar table:

[Main Data]:
Load 
    Year & '|' & Month as YearMonth,
    Total
From FactTable.qvd (qvd);


Calendar:
Load
    Year & '|' & Month as YearMonth,
    Month,
    Year,
    Quarter,
    [Month Name]
From Calendar.qvd (qvd);

We are using the ampersand operator to merge the values from the two fields into one. We then assign an alias to the new calculated field by using the as keyword.

Tip

Adding a delimiter to concatenated fields

It's always a good idea to add a separator between fields when concatenating them to ensure data consistency. For example: Year & ' | ' & Month as YearMonth

The resulting data model, created using compound keys, is shown in the following screenshot:

Creating a composite key

The synthetic key has been successfully eliminated from the data model and the associations between both tables have been explicitly defined.

Dealing with circular references

Similar to how synthetic keys are created, a circular reference can also be the result of unwanted associations in our data model and, as such, they can be fixed using the same principles described earlier. But before getting into how to solve them, let's first see what they are.

We can think of the data model as a map that shows the paths through which we can walk to get from one point to another. For instance, considering the data model seen in the previous section, the only path to get from the fact table to the Calendar table is the YearMonth route. This means in order to get there you must know what Month and Year you are looking for in the Calendar table.

However, when the data model becomes more and more complex, with a larger number of tables, and more and more destination points, we might also get to a point where we have more than one route connecting point A to point B. All roads lead to Rome, they say. Well, in our case, we must always have one road between any two points. Otherwise, we would be having a circular reference.

To better understand what a circular reference is, let's look at the following data model:

Dealing with circular references

As you can see, a circular reference exists when the connections among the tables simulate a circle and we have two routes to get from any given point to another. For example, we can get from the Main Data table to the Calendar table either directly, through the YearMonth route, or by going first to the Promotions table and then moving to the Calendar table.

This is an issue that needs to be addressed, and one that can create severe data inconsistency problems, performance degradation, and even crashes.

To solve the presented scenario, and based on how we deal with synthetic keys in the flow chart, we should start by asking which of the created associations are correct and which aren't. In this case, the association between the Promotions table and the Calendar table is incorrect since the Month field stored in the former table does not necessarily represent the month in which the promotion was used, but rather the month in which the promotion was created. They just happen to have the same name.

Most of the time, as in the earlier example, we will find that the problem arises from unwanted associations and the issue is easy to solve (through the first method from the section on synthetic keys). Other times, there are design challenges that need to be analyzed thoroughly.

In Chapter 9, Data modeling best practices, there is a full section dedicated to address the main design challenges a developer could come across when designing a data model.

Now, let's quickly describe how the Table Viewer window works and how it can become our best ally when data modeling.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset