Chapter 6. Advanced Scripting

 

"In my opinion, the vast majority of scripts written … are not very original, well-written, or interesting. It has always been that way, and I think it always will be."

 
 --Viggo Mortensen

In anything more than the simplest of QlikView applications, the script is where we spend a very large percentage of our development time.

When we discussed the performance tuning of our applications (Chapter 1, Performance Tuning and Scalability), we discussed that almost all of the effort to make our applications more efficient and to consume less memory will be made in the script. Even when we tune expressions in the frontend, then this is more than likely going to be supported by script work.

All data modeling work is going to be in the script. Of course, implementing an ETL process is something that we do in the script. We can use the script to simplify advanced expressions.

Almost everything we discussed in this book so far is either directly script-related or directly influenced by what we do in the script. Therefore, to truly be a QlikView master, you need to master the QlikView script.

This chapter is all about learning great ways of manipulating data in the script. If you can master these methods, then you are well on your way to mastery of the whole product.

These are the topics we'll cover in this chapter:

  • Counting records
  • Loading data quickly
  • Applying variables and the Dollar-sign Expansion in the script
  • Using control structures
  • Examining advanced Table File Wizard options
  • Looking at data from different directions
  • Reusing the code

Reviewing the basic concepts

We will have a quick look at some of the basic concepts that we should be aware of when first starting to load data. Anyone who has done basic QlikView training should be familiar with the concepts here, but it is worth reviewing them.

Using Table Files Wizard

We don't have to use Table Files Wizard to load data from file sources, but it is very useful to help us generate the necessary script to load the data correctly. We have some buttons in the script editor that give us access to Table Files Wizard:

Using Table Files Wizard

These buttons are listed in the following table:

Button

Description

Table Files

This button opens a standard File Open dialog. Once a file is selected, the main Table Files Wizard will open with an appropriate file type, based on QlikView's interpretation of the file's content, already selected for us.

QlikView File

This won't actually open the wizard because it only allows a QlikView QVW file to be selected. It will insert a BINARY statement at the beginning of the active tab in the script. We need to be careful here because BINARY, if used, must be the very first statement in the script, so it should be on the very first tab in the script.

Web Files

This allows us to enter a web URI to point at a file source located on the Internet. This can be HTML, but can also be any of the other supported file types.

Field Data

This allows you to point at a field that you have already loaded into the script (therefore, you must have run the script at least once) and parse the contents of the field using a delimiter or fixed record rules.

Some of us might have (accidentally or on purpose) clicked on the Back button in the first screen that appears on Table Files Wizard and discovered the actual first page, which corresponds to three of the buttons:

Using Table Files Wizard

Using relative paths

When we load a file in QlikView, we can either use an absolute or a relative path.

When discussing file paths, an absolute path means the full path to a file starting with either a drive letter or a UNC path, for example, C:QVDocumentsFinanceSourcesBudget.xls or \QVServerQVDocumentsFinanceSourcesBudget.xls.

A relative path means that the path is expressed relative to another path. The default start path is the location of the QlikView QVW file. So, if we have our QVW in:

C:QVDocumentsFinanceApps

Then, the relative path to the source file is:

..SourcesBudget.xls

Note

The . and .. are relative path indicators that have been around since the earliest Unix days. The . indicates the current folder and .. indicates the parent folder. You can concatenate several of them, so .... indicates the parent of the parent folder.

We can also specify an alternate start path using the Directory statement. So, if we issue this command:

Directory 'C:QVDocumentsFinance';

Then, the relative path the source file becomes:

SourcesBudget.xls

Alternatively, you can also use:

.SourcesBudget.xls

If we turn on the Relative Paths checkbox on the Data tab in the script editor, then Table Files Wizard will return the path as a relative path, relative to the document location. It will also automatically add a Directory statement like this:

Directory;

A Directory statement without specifying a path is actually superfluous, as it means to just use the default path—the location of the QVW file. Therefore, we can feel free to delete this statement if we don't want to use it.

The main reason why it is preferable to use relative paths instead of absolute paths is transportability—we can move a folder system from one server (for example, a development server) to another (for example, a preproduction system) and all of the paths should still work without having to make any edits to the script.

Delimited files

If the Wizard detects that the file content is just text, it usually guesses that we are dealing with a delimited file and will have a guess at what the delimiter is from the data:

Delimited files

It is often quite good at detecting some of these settings, especially the Character Set value, but we might need to tweak these sometimes. Usually, the tweak is just setting the Labels option from None to Embedded Labels.

Note

The usual options in the Labels section that we need to be concerned with are either None or Embedded Labels. The third option, Explicit, is only relevant for certain file types, specifically Data Interchange Format (DIF), which includes a header section that contains explicit labels.

If the first line of the file contains the labels, then we should choose the Embedded Labels option. If we choose None, then the fields will be named @1, @2, @3, and so forth. We can, of course, rename these fields like this:

LOAD
     @1 as SalesPerson, 
     @2 as Company, 
     @3 as [Sales Value], 
     @4 as [Number of Orders]
FROM
[..SourcesSalesReport.csv]
(txt, utf8, no labels, comment is #, delimiter is ',', msq);

To facilitate this, the wizard allows us to change the name in the data display grid, and then it will generate the As statement for us:

Delimited files

Besides changing the Labels options, we might also add an entry under Comment. Here, we can define a value that might appear at the beginning of a line in the text file (# or // are common), which indicates that this line is a comment and we don't need to load it. Rows beginning with this text will be ignored.

Fixed width files

When the data source has been outputted by a reporting system, it is quite common that the data is in a fixed width format. Every value in each report column takes up the same amount of space, with spaces added wherever necessary to pad the values out to fit. Consider the following example:

Country        Sales $  No. Orders
Germany      92,981.20          29
USA          26,265.16          16
France       25,002.56          15

To load this in QlikView, we just need to tell the wizard exactly the width each row takes up, and this can be done by a click of the mouse:

Fixed width files

The field names that are generated by the wizard contain the position of the first character and the last character separated by a colon:

LOAD @1:12, 
     @13:24, 
     @25:n
FROM
[..SourcesCountryReport.txt]
(fix, utf8);

The last field will usually have n specified as the ending character. This indicates the end of line position.

We are free to modify these manually, if we need to, as well as adding field aliases. We can even have fixed positions overlapping if it makes sense to do so:

LOAD [@1:20] As Field1, 
     [@8:24] As Field2, 
     [@16:n] As Field3

XML files

The wizard is very good at dealing with XML data, from simple tables to more complex relationships. For example, we can have an XML file with data like this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<CountryCity>
<Country name="USA">
  <City>New York</City>
  <City>Dallas</City>
  <City>Boston</City>
</Country>
<Country name="Austria">
  <City>Graz</City>
<City>Salzburg</City>
</Country>
<Country name="Belgium">
  <City>Bruxelles</City>
  <City>Charleroi</City>
</Country>
</CountryCity>

We can see that the previous data includes values in both tags and elements and also that there is a hierarchy of data between country and city.

When we load this data into Table Files Wizard, the tool automatically recognizes the hierarchies as different tables:

XML files

When we click on Finish in the wizard, it will generate the code to load each of the tables, with an automatically generated ID field to associate them:

// Start of [CountryCity.xml] LOAD statements
City:
LOAD City%Table,
    %Key_Country_DDD45FBB422C070A    
// Key to parent table: CountryCity/Country
FROM [..SourcesCountryCity.xml] 
(XmlSimple, Table is [CountryCity/Country/City]);

Country:
LOAD name,
    %Key_Country_DDD45FBB422C070A    
// Key for this table: CountryCity/Country
FROM [..SourcesCountryCity.xml] 
(XmlSimple, Table is [CountryCity/Country]);
// End of [CountryCity.xml] LOAD statements

Of course, we should probably think about joining these tables together and then dropping the key field.

HTML files

QlikView can handle most HTML files that have tables defined (sometimes, it has difficulty with XHTML). You can either connect to a file locally or a web URL.

For example, if you want to grab the currency conversion rates from the front page of http://www.xe.com/, enter the link as follows:

HTML files

The wizard will connect to the website and retrieve information about all of the tables on the page. In this case, there is just one:

HTML files

In other cases, you might need to click through the list of tables offered, @1, @2, @3, and so forth, and use the preview window to identify the correct one. The script might look like this:

LOAD [Auto-refresh  15x      0 : 59],
     [Auto-refresh  15x      0 : 591], 
     USD, 
     EUR, 
     GBP, 
     INR, 
     AUD, 
     CAD, 
     ZAR, 
     NZD, 
     JPY
FROM
[http://www.xe.com/]
(html, codepage is 1252, embedded labels, table is @1);

It could be that the field name that is identified doesn't actually work when you try the reload (as in this case). You could try playing with the spacing—this works here:

LOAD [Auto-refresh15x0 : 59],
     [Auto-refresh15x0 : 591], 

You can replace the fields with just a *:

LOAD *
FROM
[http://www.xe.com]
(html, codepage is 1252, embedded labels, table is @1);

QVD/QVX files

When it comes to QVD or QVX files, we don't get to modify any settings in the wizard to change the way the file is handled. Setting such as Embedded Labels, Delimiter, Header Size, and so forth, are meaningless when loading a QVD, as all of the information that is needed to interpret the file is already embedded in the file.

Connecting to databases

QlikView can connect to almost every on-premise database system in the world. In fact, the only ones that we might have trouble with are very archaic ones that do not have open drivers.

QlikView can use one of the three different driver types to connect to databases:

Driver type

Description

ODBC

An Open Database Connectivity driver allows us to connect to the majority of the world's database systems because most of the world's database vendors will either issue a driver for free, along with their client tools, or will have licenses for a third party to create a driver. ODBC drivers are configured at the operating system level and their settings are stored in the system registry. Therefore, if documents are moved from development to test/production systems, we need to ensure that the same driver is configured on all systems.

OLEDB

OLEDB is Microsoft's standard for connecting programmatically to databases. It is quite different in implementation from ODBC, but we don't really need to worry about that. Most of the larger database vendors will have an OLEDB driver available as well as the ODBC one. The OLEDB option tends to be faster, especially for Microsoft databases. The configuration information for an OLEDB connection is stored within the QlikView document's CONNECT statement, so it can be a little more portable; we just need to ensure that the drivers are installed on every server that needs them.

Custom

A custom driver can be written, using QlikView's APIs, to allow connections to many more systems. For example, Qlik has custom drivers available for both SAP and SalesForce.com—systems that we cannot otherwise connect directly to. They also have a custom driver that talks to their own server management service and can read information from that into QlikView. In theory, a custom connector can be built for almost any database system that we can think of.

Using the Connect button

When we look at the Data tab in the script editor, we see a dropdown that allows us to select the driver type that we want to use:

Using the Connect button

Once we have selected the driver type that we want to use, we click on the Connect… button, which will open a dialog that is appropriate to the selected driver. For ODBC, we don't need to provide any of the connection detail, just the username and password, and the dialog will look like this:

Using the Connect button

The connection dialog for OLEDB is different because we need to provide connection information. The OLEDB dialog is one that many developers will be familiar with because it comes from the operating system, not from QlikView. We first need to select the correct database driver, and then we can provide connection-specific information. For example, if we used a SQL Server connection, it might look like this:

Using the Connect button

Any custom connector will have its own dialog. Some have no dialogs at all!

Understanding the Connect To statement

The purpose of all of the dialogs is to generate a Connect To statement. This is the statement that tells QlikView how to connect to the driver that is being used.

The Connect To statement is usually preceded by an indication of the connection type: ODBC, OLEDB, or CUSTOM. If the connection type is omitted, then ODBC is assumed.

Here is an example ODBC Connect To statement:

ODBC CONNECT TO QVData_ODBC (XUserId is IMcKXZFMCC, XPassword is GRdHfABOQDbKWZJFeE);

We can see that all we need is the ODBC name because the rest of the information necessary to make the connection is already configured within the ODBC connection. We have provided the username and password in the dialog box and QlikView will encrypt them so that casual viewers will not be able to see them.

Compare the ODBC Connect To statement to an OLEDB Connect To statement:

OLEDB CONNECT TO [Provider=SQLNCLI11.1;Integrated Security=SSPI;Persist Security Info=False;User ID="";Initial Catalog=QVData;Data Source=QVENTSQLWH;Use Procedure for Prepare=1;Auto Translate=True;Packet Size=4096;Workstation ID=SRVR1;Initial File Name="";Use Encryption for Data=False;Tag with column collation when possible=False;MARS Connection=False;DataTypeCompatibility=0;Trust Server Certificate=False;Application Intent=READWRITE];

In this case, all of the information necessary to make the connection will be listed in the Connect string. This is similar to a Custom Connect To statement:

CUSTOM CONNECT TO "Provider=QvsAdminDataProvider.dll;host=localhost;XUserId=HONSdKD;XPassword=bfAXSUC;";

Explaining the Force 32 Bit option

Prior to QlikView Version 10, if your database vendor only supplied a 32-bit version of its driver, you can only connect to it with a 32-bit version of QlikView. This causes a lot of problems for QlikView customers running 64-bit server versions that could not perform automatic reloads without having to run a 32-bit QlikView desktop from the command line.

In QlikView Version 10.0, the Force 32 Bit option was introduced to overcome this problem. Now, along with specifying the connection string, we can also specify whether a 32- or 64-bit connection should be used in the Connect To statement:

ODBC CONNECT32 TO [QVData] (XUserId is WAKVcARMNLacWYB);

QlikView actually calls separate processes to open the connections and run queries. They are QVConnect32.exe and QVConnect64.exe, which are 32-bit and 64-bit applications, respectively. If we call a Connect To or Connect64 To statement using a 64-bit version of QlikView, QVConnect64.exe will be executed. If we call Connect32 To, then QVConnect32.exe will be executed. QlikView running on a 32-bit system can only execute QVConnect32.exe.

The Force 32 Bit option in the Data tab will mean that clicking on the Connect button will open 32-bit versions of the dialog that have access to 32-bit drivers. These dialogs will also generate a Connect32 To statement instead of just a Connect To statement.

The Select wizard

Once we have created a connection of any type, its details are cached in the document. This allows us to access the Select button and retrieve information about the tables and views in our database:

The Select wizard

This wizard is a very useful tool because it allows us to interrogate the data structures in the database, preview the data that these tables and views contained, and generate appropriate SQL to retrieve the data.

The default option is for the wizard to generate a very simple Select * query to retrieve the data:

The Select wizard

We can also select specific fields from the list of fields to create a more specific, yet still quite simple, query:

The Select wizard

We can also turn on the (highly recommended) Preceding Load checkbox. This places a QlikView Load statement above the SQL statement. This preceding Load statement allows us to apply QlikView functions to the data as we are loading it from the database. A SQL statement with a preceding Load statement might look like this:

LOAD OrderID,
OrderDate,
CustomerID,
EmployeeID,
//Freight,
    "LineNo",
ProductID,
    Quantity,
SalesPrice,
LineValue,
LineCost;
SQL SELECT *
FROM QVTraining.dbo."Order_Fact";

We might note that even though the SQL query is a very simple Select *, we still get the full field list in the preceding Load statement.

Note the piece of script that the Freight field is commented out. Even though there will be a Freight field loaded from the database, if we do not load it in the preceding Load statement, then the field will not make it into the final data model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset