In this chapter we will dive into advanced transformation functions and techniques available through QlikView's extraction engine. This will allow you, as a developer, to finely process the source data and turn it into a clean design, while at the same time keeping an efficient script.
The goals of this chapter are:
Now that we have a decent amount of QlikView development experience under our belt, we will introduce the concept of data architecture. This refers to the process of structuring the different layers of data processing that exist between the source tables and the final document(s). Having a well-designed data architecture will greatly simplify the administration of the QlikView deployment. It also makes the QlikView solution scalable when new applications need to be developed and when the QlikView environment grows. There can be a lot of different data architectures, but in this section we will discuss two of the most commonly used in QlikView enterprise deployments.
The following diagram depicts the two-stage architecture:
The two-stage architecture is composed of the following layers:
The advantages of using this approach and having a QVD Layer are reuse and consistency. This approach promotes re-use because, in deployments where multiple documents make use of the same source data, the original database (Source Layer) is not overloaded with redundant requests. At the same time, the re-use process ensures consistency across all different QlikView documents that make use of the same data.
This approach is mainly used when the source data is good enough to be included into the QlikView data model with little or no modification. However, when major data transformation is needed, the administration gets a little messy with this architecture since it is not clearly defined at which stage these transformations take place.
Now, let's take a look at the three-stage architecture:
In this architecture, two additional layers are added: Transformation Layer and Transformed QVD Layer. The role of these two additional layers is to hold all transformations that need to be performed upon the source data before it can be integrated into the target data model.
This also suggests that all base QVDs will ideally keep a straight copy of the source table, which will optimize the extraction process. Then, the Transformation Layer, in which several base QVDs will be combined to create denormalized QVDs, performs any required aggregation or segmentation, and adds new calculated fields or composite keys to prepare the transformed QVDs for a clean and simple load into the final data model.
Since the documents in the Presentation Layer will use the transformed QVDs, and sometimes some base QVDs that required no modification, and will (ideally) read them "as-is", optimized loads will be ensured at this stage.
QVDs can also be reused with this architecture when the data model of two or more QlikView documents require the same source data.
A well-designed data architecture, as those presented in this section, can also enable the possibility of having different QlikView teams working at different stages. For example, IT developers can prepare the base and transformed QVDs, while business teams can make use of those to build the end documents without requiring access to the source database.
Now that we've discussed the advantages of using the three-stage architecture, let's take a moment to set up our Windows folder structure following the described guidelines.
By copying the files corresponding to this chapter into your QlikView Development folder, you will have a structure like this:
The 0.Includes folder is used to store re-usable code that is called from the end documents via an Include
statement. The 1.Source Data folder represents the Source Layer; this folder is used because our source database is composed of CSV files but wouldn't be required otherwise. The 2.Workbooks folder holds all QVD Generators (QVW files) for both the Base Layer and the Transformation Layer. The 3.QVD folder is used to store the resulting QVD files from both the Base Layer (using the Source subfolder) and the Transformation Layer (using the Transformed subfolder). The 4.Applications folder represents the Presentation Layer.
Inside these folders, you will find all source tables in CSV format, as well as the extract QVWs and the base QVDs used in previous chapters. We will work directly with the Transformation Layer in the coming sections.