Mastering loading techniques

There are a few techniques for data loading that you need to spend some time learning to be a true master of the subject. We will have a look at some examples of them in this section.

It has already been mentioned that there is an excellent article on incremental load in both the help file and the reference manual. We will work through some examples here to help give you a good grounding in the subject. We will also look at a couple of other load techniques that will be very useful in your arsenal—binary load and partial load.

Loading data incrementally

The basic process of an incremental load is to have most of the data stored in QVDs on the server and then connect to the database to just obtain those records that are needed to update the QVDs to be concurrent.

Thinking about this, there must be a few pieces that are needed before we can implement the strategy:

  • There will need to be an initial load of the data. This may be a complete load of the data table into one QVD or it may be a partitioned load of the data in several QVD files based on, most likely, a date field.
  • We will need to be able to establish which field in the data identifies new or updated records. If the data is transactional, with only new rows being ever added, a sequential ID field will work for this purpose. A create date can also be used. However, if the data might be modified, we will need to have a date field that stores the modified date and time.
  • We need to have a primary key for the data stored. We can use this value in an Exists clause or we can use it with joins to the source data to handle deletions.
  • We will need to establish a way of storing the last time that we ran the extraction process. I like to use a variable for this as their values will persist with a given QVW file. However, we can also add some resilience by storing the value to a QVD.

    Note

    Note that because we are using QVD files to persist data, it is a good idea to ensure that those QVD files are backed up on a regular basis. Although they can, in theory, be recreated from the original data source, it may be a lot quicker to just restore the files from an archive. In the case where the original data is no longer available, backup becomes even more critical.

Establishing the script for the basic process

The script for the basic process will be as follows:

  1. Establish the date and time that the extraction was last run:
    // When was the last load?
    
    // Do we have a value in our variable?
    If Len('$(vLastExtractionDate)')=0 Then
    
      // Do we have a QVD with the date?
      Let vFileLen=FileSize('..QVDSales.Transactions.LastQVDExtractionDate.qvd'),
      if Len('$(vFileLen)')=0 Then
        // Set the last extraction date to an arbitrary date.
        // For example, the first day of this year
        Let vLastExtractionDate=
             TimeStamp(YearStart(Today()), 'YYYYMMDD HH:mm:ss'),
      Else 
        LastExtraction:
        Load
          LastExtractionDate
        From
          [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD);
        
        Let vLastExtractionDate=Peek('LastExtractionDate'),
        
        Drop Table LastExtraction;
        
        // It is possible that there was no date in the file
        if Len('$(vLastExtractionDate)')=0 Then
          Let vLastExtractionDate=YearStart(Today());
        End if
      End if
    
    End if
  2. Record the current date and time:
    // Record the current date and time
    Let vCurrentExtractionDate=TimeStamp(Now(), 'YYYYMMDD HH:mm:ss'),
  3. Extract the records from the database where the modified date lies between the two dates:
    // Load the modified records
    Orders:
    LOAD OrderID,
        OrderDate,
        CustomerID,
        EmployeeID,
        Freight;
    SQL SELECT *
    FROM QVTraining.dbo.OrderHeader
    Where OrderDate >= '$(vLastExtractionDate)' 
    and OrderDate < '$(vCurrentExtractionDate)';
  4. Concatenate data from the stored QVD—if it exists—where we have not already loaded that row:
    // Concatenate QVD data - if it exists
    Let vFileLen=FileSize('..QVDE_Sales.Transactions.qvd'),
    // Note that if the file doesn't exists, vFileLen will be blank
    If Len('$(vFileLen)')>0 Then
    
      Concatenate (Orders)
      Load *
      From 
      [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD)
      Where Not Exists(OrderID);
    
    End if
  5. Store the entire table back to the QVD:
    // Store the data back to the QVD
    Store Orders into [..QVDE_Sales.Transactions.qvd] (QVD);
    
    Drop Table Orders;
  6. Update the date and time for the last extraction:
    // Update the Last Extract date
    Let vLastExtractionDate=vCurrentExtractionDate;
    
    // Persist the value to QVD
    LastExtraction:
    Load
      '$(vLastExtractionDate)' As LastExtractionDate
    AutoGenerate(1);
    
    Store LastExtraction into [..QVDSales.Transactions.LastQVDExtractionDate.qvd] (QVD);
    
    Drop Table LastExtraction;

Running an incremental load when data is only added

In many transactional systems, rows are only allowed to be added to the system. This is true for many bookkeeping systems. If you make a mistake, you are not allowed to edit or delete the row: you need to add a new transaction to correct the error.

In that case, our basic process is actually too complex. It will work perfectly as it is, but we can modify it to remove the Not Exists clause when loading the QVD. In theory, the QVD should never contain records that we have loaded within the date range. However, in the real world, it is always better to leave the check in place—Exists does not impact an optimized load from the QVD.

Loading incrementally when data might be modified

Other systems allow users to make adjustments directly to the transactional data. If they do, they will usually (although not universally!) have a field that contains the timestamp for when the modification was made.

In this case, our basic script should work perfectly. You just need to modify the extraction query and make sure that you include a where clause on the field that contains the modified date.

Handling deletions from the source system

It could be possible that the system that you are reading data from may allow transaction rows to be deleted. The problem for us is that we may have one of the deleted rows already stored in our QVD and we will get no indication (because there can be no modified date on a deleted row!) that the row is gone.

In that situation, we will add an Inner Join load of the primary key value from the data source, just after we have concatenated the rows from the QVD to the modified rows, but just before we store the data to QVD. The Inner Join load will remove any rows from the in-memory table that do not exist in the data source. We can then store the table to file and the deleted rows will no longer exist, for example:

// Check for deleted records
Inner Join (Orders)
SQL SELECT OrderID
FROM QVTraining.dbo.OrderHeader
Where OrderDate >= '20140101';

Note that there is a date on this. We are assuming here that previous years' data is stored in separate QVD files, so we would not be modifying this.

Handling situations where there is no modify date

Handling situations when there is no modify date present is tricky and you will need to utilize the assistance of the local DBA or application developer. Often the system will keep a log of changes and you may be able to query this log to obtain a list of the primary keys for the records that have changed in the period since the last extraction.

If there is no such set of records, you may be able to get the DBA or developer to create a database trigger that creates a separate record in a table when a row is inserted or modified. You can then query this table to obtain your list of primary keys.

Whatever the situation, there is often some kind of solution available.

Partially reloading only one part of the data model

Partial reload in QlikView is a very useful feature. It allows us to either completely replace a whole table in the data model or add new rows to a table, without modifying the data in any of the other tables. This can be used to really speed up data loads for more real-time applications.

A partial reload can be executed from the File menu in QlikView desktop, or by selecting the Partial checkbox when configuring the reload schedule in QlikView Server or publisher. When the partial reload is executed, it will completely ignore tables that are loaded normally and will not modify them in any way. However, tables that have a load statement prefixed with the Replace or Add keyword will be modified. During a normal reload, these keywords are ignored.

Note

Mapping tables will have been removed from the original data after the load, so if we are going to use them in the partial load, we will also need to reload them with the Replace keyword.

Replacing a table

To completely replace a whole table, we put the Replace keyword before the load statement for that table, for example:

Orders:
Replace
Load * 
From [..QVDOrders.qvd] (QVD);

In this case, we assume that the QVD has already been updated (perhaps using the incremental load process) and we need to replace the whole table.

Adding new rows to a table

We can also add new rows to a table without having to remove the table. By placing the Add keyword before the load statement, we can leave what we have already loaded and then just add new rows. This can be an effective method of running incremental loads:

Orders:
Add
LOAD OrderID,
    OrderDate,
    CustomerID,
    EmployeeID,
    Freight;
SQL SELECT *
FROM QVTraining.dbo.OrderHeader
Where OrderDate>'$(vLastReloadTime)';
// Update the last reload time
Let vLastReloadTime=Timestamp(Now(), 'YYYYMMDD HH:mm:ss'),

Managing script execution in partial reloads

In the last example, the final step was to update a variable. We may notice, however, that there was no option to say whether this should happen if the load is a partial load or a normal reload. All such assignments will happen either way. We can, however, manage this process by using the IsPartialReload function, which returns true or false, depending on the reload type:

If IsPartialReload() Then
  // Do partial reload stuff
Else
  // Do normal reload stuff
End if

Loading the content of another QVW

We can extract the entire contents of one QVW into another using a process called Binary load. The Binary statement takes the path to a QlikView QVW file and loads all of the data tables, symbol tables, and so forth into the loading document.

Because this process essentially creates a new data model in the loading document, there is a rule about Binary, in that it must be the very first statement executed in the script. Also, we can have only one Binary statement in any one application.

Once the Binary load has completed, you can then add additional script to do whatever you need to do. For example, you may wish to add a new table. Another thing that you may want to do is extract tables from the original data into QVD files. You may also want to drop tables.

One use case that I have for this is for the creation of several documents that have an identical data model but will have different UIs. You may want to give a more structured UI with locked down ability to add new objects, or export data, to one set of users, while giving a different UI with full collaboration and export to another set of users.

Henric Cronström from Qlik has written an excellent blog post on how the cache in QlikView Server works that indicates that because the QlikView Server cache is global, there are actually cache efficiencies that mean that this approach is not necessarily a bad thing for your server:

http://community.qlik.com/blogs/qlikviewdesignblog/2014/04/14/the-qlikview-cache

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset