There are a few techniques for data loading that you need to spend some time learning to be a true master of the subject. We will have a look at some examples of them in this section.
It has already been mentioned that there is an excellent article on incremental load in both the help file and the reference manual. We will work through some examples here to help give you a good grounding in the subject. We will also look at a couple of other load techniques that will be very useful in your arsenal—binary load and partial load.
The basic process of an incremental load is to have most of the data stored in QVDs on the server and then connect to the database to just obtain those records that are needed to update the QVDs to be concurrent.
Thinking about this, there must be a few pieces that are needed before we can implement the strategy:
ID
field will work for this purpose. A create date can also be used. However, if the data might be modified, we will need to have a date
field that stores the modified date and time.Exists
clause or we can use it with joins to the source data to handle deletions.Note that because we are using QVD files to persist data, it is a good idea to ensure that those QVD files are backed up on a regular basis. Although they can, in theory, be recreated from the original data source, it may be a lot quicker to just restore the files from an archive. In the case where the original data is no longer available, backup becomes even more critical.
The script for the basic process will be as follows:
// When was the last load? // Do we have a value in our variable? If Len('$(vLastExtractionDate)')=0 Then // Do we have a QVD with the date? Let vFileLen=FileSize('..QVDSales.Transactions.LastQVDExtractionDate.qvd'); if Len('$(vFileLen)')=0 Then // Set the last extraction date to an arbitrary date. // For example, the first day of this year Let vLastExtractionDate= TimeStamp(YearStart(Today()), 'YYYYMMDD HH:mm:ss'); Else LastExtraction: Load LastExtractionDate From [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD); Let vLastExtractionDate=Peek('LastExtractionDate'); Drop Table LastExtraction; // It is possible that there was no date in the file if Len('$(vLastExtractionDate)')=0 Then Let vLastExtractionDate=YearStart(Today()); End if End if End if
// Record the current date and time Let vCurrentExtractionDate=TimeStamp(Now(), 'YYYYMMDD HH:mm:ss');
// Load the modified records Orders: LOAD OrderID, OrderDate, CustomerID, EmployeeID, Freight; SQL SELECT * FROM QVTraining.dbo.OrderHeader Where OrderDate >= '$(vLastExtractionDate)' and OrderDate < '$(vCurrentExtractionDate)';
// Concatenate QVD data - if it exists Let vFileLen=FileSize('..QVDE_Sales.Transactions.qvd'); // Note that if the file doesn't exists, vFileLen will be blank If Len('$(vFileLen)')>0 Then Concatenate (Orders) Load * From [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD) Where Not Exists(OrderID); End if
// Store the data back to the QVD Store Orders into [..QVDE_Sales.Transactions.qvd] (QVD); Drop Table Orders;
// Update the Last Extract date Let vLastExtractionDate=vCurrentExtractionDate; // Persist the value to QVD LastExtraction: Load '$(vLastExtractionDate)' As LastExtractionDate AutoGenerate(1); Store LastExtraction into [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD); Drop Table LastExtraction;
In many transactional systems, rows are only allowed to be added to the system. This is true for many bookkeeping systems. If you make a mistake, you are not allowed to edit or delete the row: you need to add a new transaction to correct the error.
In that case, our basic process is actually too complex. It will work perfectly as it is, but we can modify it to remove the Not Exists
clause when loading the QVD. In theory, the QVD should never contain records that we have loaded within the date range. However, in the real world, it is always better to leave the check in place—Exists
does not impact an optimized load from the QVD.
Other systems allow users to make adjustments directly to the transactional data. If they do, they will usually (although not universally!) have a field that contains the timestamp for when the modification was made.
In this case, our basic script should work perfectly. You just need to modify the extraction query and make sure that you include a where
clause on the field that contains the modified date.
It could be possible that the system that you are reading data from may allow transaction rows to be deleted. The problem for us is that we may have one of the deleted rows already stored in our QVD and we will get no indication (because there can be no modified date on a deleted row!) that the row is gone.
In that situation, we will add an Inner Join
load of the primary key value from the data source, just after we have concatenated the rows from the QVD to the modified rows, but just before we store the data to QVD. The Inner Join
load will remove any rows from the in-memory table that do not exist in the data source. We can then store the table to file and the deleted rows will no longer exist, for example:
// Check for deleted records Inner Join (Orders) SQL SELECT OrderID FROM QVTraining.dbo.OrderHeader Where OrderDate >= '20140101';
Note that there is a date on this. We are assuming here that previous years' data is stored in separate QVD files, so we would not be modifying this.
Handling situations when there is no modify date present is tricky and you will need to utilize the assistance of the local DBA or application developer. Often the system will keep a log of changes and you may be able to query this log to obtain a list of the primary keys for the records that have changed in the period since the last extraction.
If there is no such set of records, you may be able to get the DBA or developer to create a database trigger that creates a separate record in a table when a row is inserted or modified. You can then query this table to obtain your list of primary keys.
Whatever the situation, there is often some kind of solution available.
Partial reload in QlikView is a very useful feature. It allows us to either completely replace a whole table in the data model or add new rows to a table, without modifying the data in any of the other tables. This can be used to really speed up data loads for more real-time applications.
A partial reload can be executed from the File menu in QlikView desktop, or by selecting the Partial checkbox when configuring the reload schedule in QlikView Server or publisher. When the partial reload is executed, it will completely ignore tables that are loaded normally and will not modify them in any way. However, tables that have a load statement prefixed with the Replace
or Add
keyword will be modified. During a normal reload, these keywords are ignored.
To completely replace a whole table, we put the Replace
keyword before the load statement for that table, for example:
Orders: Replace Load * From [..QVDOrders.qvd] (QVD);
In this case, we assume that the QVD has already been updated (perhaps using the incremental load process) and we need to replace the whole table.
We can also add new rows to a table without having to remove the table. By placing the Add
keyword before the load statement, we can leave what we have already loaded and then just add new rows. This can be an effective method of running incremental loads:
Orders: Add LOAD OrderID, OrderDate, CustomerID, EmployeeID, Freight; SQL SELECT * FROM QVTraining.dbo.OrderHeader Where OrderDate>'$(vLastReloadTime)'; // Update the last reload time Let vLastReloadTime=Timestamp(Now(), 'YYYYMMDD HH:mm:ss');
In the last example, the final step was to update a variable. We may notice, however, that there was no option to say whether this should happen if the load is a partial load or a normal reload. All such assignments will happen either way. We can, however, manage this process by using the IsPartialReload
function, which returns true
or false
, depending on the reload type:
If IsPartialReload() Then // Do partial reload stuff Else // Do normal reload stuff End if
We can extract the entire contents of one QVW into another using a process called Binary
load. The Binary
statement takes the path to a QlikView QVW file and loads all of the data tables, symbol tables, and so forth into the loading document.
Because this process essentially creates a new data model in the loading document, there is a rule about Binary
, in that it must be the very first statement executed in the script. Also, we can have only one Binary
statement in any one application.
Once the Binary
load has completed, you can then add additional script to do whatever you need to do. For example, you may wish to add a new table. Another thing that you may want to do is extract tables from the original data into QVD files. You may also want to drop tables.
One use case that I have for this is for the creation of several documents that have an identical data model but will have different UIs. You may want to give a more structured UI with locked down ability to add new objects, or export data, to one set of users, while giving a different UI with full collaboration and export to another set of users.
Henric Cronström from Qlik has written an excellent blog post on how the cache in QlikView Server works that indicates that because the QlikView Server cache is global, there are actually cache efficiencies that mean that this approach is not necessarily a bad thing for your server:
http://community.qlik.com/blogs/qlikviewdesignblog/2014/04/14/the-qlikview-cache