Chapter 9. Advanced Data Transformation

In this chapter we will dive into advanced transformation functions and techniques available through QlikView's extraction engine. This will allow you, as a developer, to finely process the source data and turn it into a clean design, while at the same time keeping an efficient script.

The goals of this chapter are:

  • To provide an overview of the most commonly used data architectures that can ease QlikView's development and administration
  • To describe the available functions for data aggregation
  • To learn how to take advantage of some of QlikView's most powerful data transformation functions.

Data architecture

Now that we have a decent amount of QlikView development experience under our belt, we will introduce the concept of data architecture. This refers to the process of structuring the different layers of data processing that exist between the source tables and the final document(s). Having a well-designed data architecture will greatly simplify the administration of the QlikView deployment. It also makes the QlikView solution scalable when new applications need to be developed and when the QlikView environment grows. There can be a lot of different data architectures, but in this section we will discuss two of the most commonly used in QlikView enterprise deployments.

Two-stage architecture

The following diagram depicts the two-stage architecture:

Two-stage architecture

The two-stage architecture is composed of the following layers:

  • Source Layer: composed of the source databases and original tables.
  • Extract Layer: composed of QlikView documents, containing mainly script. These are used to pull the data from the source layer and store it into QVD files. The extraction scripts can either create a straight copy of the source tables to store them into the corresponding QVD files, or perform certain transformations before storing the result.
  • QVD Layer: the set of QVDs resulting from the Extract Layer. These QVDs become the data sources used by the final QlikView document.
  • Presentation Layer: the set of QlikView documents used to provide the data to the end user. These QlikView files will use the QVDs created in the previous layer as data sources, and sometimes perform additional transformations to create the final data model. No database calls are performed from the presentation layer.

The advantages of using this approach and having a QVD Layer are reuse and consistency. This approach promotes re-use because, in deployments where multiple documents make use of the same source data, the original database (Source Layer) is not overloaded with redundant requests. At the same time, the re-use process ensures consistency across all different QlikView documents that make use of the same data.

Note

If you look closely, you'll notice that this architecture is the one we've been using in the previous chapters, since we've mainly loaded data into our QlikView document from previously-created QVDs.

This approach is mainly used when the source data is good enough to be included into the QlikView data model with little or no modification. However, when major data transformation is needed, the administration gets a little messy with this architecture since it is not clearly defined at which stage these transformations take place.

Three-stage architecture

Now, let's take a look at the three-stage architecture:

Three-stage architecture

In this architecture, two additional layers are added: Transformation Layer and Transformed QVD Layer. The role of these two additional layers is to hold all transformations that need to be performed upon the source data before it can be integrated into the target data model.

This also suggests that all base QVDs will ideally keep a straight copy of the source table, which will optimize the extraction process. Then, the Transformation Layer, in which several base QVDs will be combined to create denormalized QVDs, performs any required aggregation or segmentation, and adds new calculated fields or composite keys to prepare the transformed QVDs for a clean and simple load into the final data model.

Since the documents in the Presentation Layer will use the transformed QVDs, and sometimes some base QVDs that required no modification, and will (ideally) read them "as-is", optimized loads will be ensured at this stage.

QVDs can also be reused with this architecture when the data model of two or more QlikView documents require the same source data.

Note

This approach is the one we will work from this point onwards since new transformations will be made to our base QVDs.

A well-designed data architecture, as those presented in this section, can also enable the possibility of having different QlikView teams working at different stages. For example, IT developers can prepare the base and transformed QVDs, while business teams can make use of those to build the end documents without requiring access to the source database.

Setting up our environment

Now that we've discussed the advantages of using the three-stage architecture, let's take a moment to set up our Windows folder structure following the described guidelines.

By copying the files corresponding to this chapter into your QlikView Development folder, you will have a structure like this:

Setting up our environment

The 0.Includes folder is used to store re-usable code that is called from the end documents via an Include statement. The 1.Source Data folder represents the Source Layer; this folder is used because our source database is composed of CSV files but wouldn't be required otherwise. The 2.Workbooks folder holds all QVD Generators (QVW files) for both the Base Layer and the Transformation Layer. The 3.QVD folder is used to store the resulting QVD files from both the Base Layer (using the Source subfolder) and the Transformation Layer (using the Transformed subfolder). The 4.Applications folder represents the Presentation Layer.

Inside these folders, you will find all source tables in CSV format, as well as the extract QVWs and the base QVDs used in previous chapters. We will work directly with the Transformation Layer in the coming sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset