While QlikView shines in dealing with massive data volumes, sometimes we just do not need to load everything at an atomic level. Data aggregation can, for example, be used in deployments where document segmentation by detail is needed, in which case two documents are created to serve different user groups and analysis needs: one document will have all data with the highest level of detail and another one will have a similar data model but with aggregated (reduced) tables. This way, users are better served by keeping a balance between performance and analysis needs.
In this section, we will implement a document segmentation scenario by aggregating the Flight Data table to create a second document intended for executive users, who only require summary data.
When aggregating data, the first step is always to define which dimension fields will be left out and which ones will be kept in the summarized table. We should analyze this question by looking at the data from the ground up, that is, by reviewing each dimension from the most granular to the most general. The following list shows the most important dimension fields in the Flight Data
table, sorted by granularity:
If we analyze how removing each dimension would individually affect the result of the summarization process, we can find that the most impact would come from removing the Airport dimensions, both Origin and Destination, since those are the ones with the greatest granularity. At the same time, we can say that the Airport dimension does not add much value to the analyses we are looking to deliver in our document, so it's a good choice to leave it out.
We could remove additional dimensions, for example, Aircraft Type or Carrier, but as we move up the detail ladder to the most general dimensions, those dimensions become more and more important to accomplish different analyses.
We must add that leaving dimensions out should be a thorough decision process, thinking both in terms of analytical requirements and the aggregation rate we can achieve. For example, removing the Country dimension would not result in any substantial aggregation if we keep the State field. Also, what happens if we remove the Airport dimensions but keep Origin City and Destination City? What happens is, not surprisingly, that the table will not be significantly reduced since both fields keep a close relation and their granularity is almost the same (there is only one airport in most cities). Therefore, and for the sake of simplicity, we will also leave out all city, state, and country fields.
Finally, before proceeding, we should keep in mind how many records the original table has, in order to be able to measure how much reduction we achieved in the summarization. In our case, the Flight Data
table originally contains 1,256,075 rows.
Moving on to the aggregation process, follow these steps:
2.Workbooks
folder with the name Transform - Flight Data.qvw
.3.QVDSource
folder.Flight Data.qvd
file and click Finish on the File Wizard window.Load
script, find the lines corresponding to those fields related to origin and destination airports and erase them. The fields we should remove are:%Origin Airport ID
%Origin Airport Sequence ID
%Origin Airport Market ID
%Origin World Area Code
%Destination Airport ID
%Destination Airport Sequence ID
%Destination Airport Market ID
%Destination World Area Code Distance
Origin Airport Code
Origin City
Origin State Code
Origin State FIPS
Origin State
Origin Country Code
Origin Country
Destination Airport Code, Destination City
Destination State Code
Destination State FIPS
Destination State
Destination Country Code Destination Country
From - To Airport Code
From - To Airport ID
From - To City
From - To State Code
From - To State
# Departures Scheduled
# Departures Performed
# Payload
# Available Seats
# Transported Passengers
# Transported Freight
# Transported Mail
# Ramp-To-Ramp Time
# Air Time
sum
the # of Departures
, or sum
the # Transported Passengers
. Identify where each of the listed fields are in the created load
statement and replace the field name with the following expression:Sum(Field Name) as Field Name
where Field Name
represents each of the listed measures.
Group By
clause to the end of the Load
statement, and list all dimension fields that have been kept in the script, separated by a comma.Load
keyword.aggregation
script will look like this:Flights: LOAD [%Airline ID], [%Carrier Group ID], [%Unique Carrier Code], [%Unique Carrier Entity Code], [%Region Code], [%Aircraft Group ID], [%Aircraft Type ID], [%Aircraft Configuration ID], [%Distance Group ID], [%Service Class ID], [%Datasource ID], [Unique Carrier], [Carrier Code], [Carrier Name], Year, Period, Quarter, [Month (#)], Month, Sum([# Departures Scheduled]) as [# Departures Scheduled], Sum([# Departures Performed]) as [# Departures Performed], Sum([# Payload]) as [# Payload], Sum([# Available Seats]) as [# Available Seats], Sum([# Transported Passengers]) as [# Transported Passengers], Sum([# Transported Freight]) as [# Transported Freight], Sum([# Transported Mail]) as [# Transported Mail], Sum([# Ramp-To-Ramp Time]) as [# Ramp-To-Ramp Time], Sum([# Air Time]) as [# Air Time] FROM [..3.QVDSourceFlight Data.qvd] (qvd) Group By [%Airline ID], [%Carrier Group ID], [%Unique Carrier Code], [%Unique Carrier Entity Code], [%Region Code], [%Aircraft Group ID], [%Aircraft Type ID], [%Aircraft Configuration ID], [%Distance Group ID], [%Service Class ID], [%Datasource ID], [Unique Carrier], [Carrier Code], [Carrier Name], Year, Period, Quarter, [Month (#)], Month;
The resulting table will turn our 1,256,075 rows into only 100,091. A brief example of what just happened is shown in the following screenshot:
Notice how the totals remain the same for both tables.
A smaller table will occupy fewer resources (RAM and CPU) and, therefore, calculations will be faster. If the performance gain attained with data aggregation doesn't mean reducing business value and/or functionality for the end user, then it's a winning approach any day.
We have loaded the base QVD containing flight data and transformed it by applying aggregations, now what? Well, the next
steps would be to store the transformed table, using the store
command, into a new QVD file that will reside in the 3.QVDsTransformed
folder.
After that, a new data model could be created in the Presentation Layer based on the Airline Operations
document, but using the newly aggregated QVD and without the Origin
and Destination
dimensions. This new QlikView document is intended to serve the users who only need summarized information about the Airline Operations
document.
Of course, QlikView offers more aggregation options than summing. The most commonly used options are shown in the following table:
Function |
Explanation |
Example |
---|---|---|
Sums numeric expressions. Optionally a |
| |
Returns the lowest value within a numeric range. Optionally a rank can be specified, this will return the nth lowest number. So |
| |
Returns the highest value within a numeric range. Optionally a rank can be specified, this will return the nth highest number. So |
| |
If the aggregation of a value returns only a single value, that value is returned, otherwise the function returns null. For example, when an expression contains the values {1, 1, 1} then the |
| |
Similar to the |
| |
Similar to the |
| |
Concatenates all the values of an expression into a single string, which is separated by a delimiter given as a function's parameter. Has an optional |
| |
Counts the number of items in the input expression. Has an optional |
|