Looking at data from different directions

Sometimes, we need to consider data from different directions. In this section, we will examine some advanced techniques for data manipulation that will make our lives easier.

Putting things first

We will often come across situations where we need to consider the earliest values in a dataset, either just the first number of rows or the first value in an ordered set. QlikView has functions for this.

First

The First statement will precede a load and states the number of records that should be loaded:

First 10
Load *
From Data.qvd (qvd);

Sometimes, if we want to just get a fixed set of rows from a file, we can use the Header option in Table Files Wizard to remove any rows preceding the rows we want and then use the First statement to grab the lines that we do want. This can be an effective strategy where there are several datasets in one worksheet in Excel.

FirstSortedValue

FirstSortedValue is a very advanced function that can be used both in the script and in charts. We use it to obtain the top value in a sorted list of values.

As an example, say that we want to retrieve ProductID with the highest sales value for each order, the following code can be used:

TopProd:
Load 
  OrderID, 
  FirstSortedValue(ProductID, -LineValue) As TopProduct
Resident
  Fact
Group by OrderID;

We retrieve the top product based on the order of the LineValue field. The minus sign preceding the field name indicates that this is a descending sort, so the first product should correspond to the highest value.

Note

Note that this is an aggregation function, so in the script, there must be a Group by clause.

We can pass some other parameters. For example, if more than one product had the same value, then the default option is to return null. If we specify Distinct before ProductID, then the duplicate situation will be ignored.

We can also pass a third parameter after the sort weight to, say, get the second or third or nth sorted value instead of the first.

Looking backwards

When loading data, it can be a very neat trick to look at data that we have loaded before. There are two great functions that we can use to do this.

Previous

The Previous function can look at a value in the previous input row. This can be really useful. In fact, we can nest multiple previous statements together to look even further back!

Mostly, the function will be combined with an Order By clause. This means that we can have some kind of expectation of what the previous record held, and therefore, test for that.

As an example, let's look at this simple dataset:

Emps:
Load * Inline [
Employee, Grade, StartDate
Brian, 1, 2010-01-04
Jill, 1, 2011-07-19
Graham, 3, 2010-02-02
Miley, 2, 2011-08-20
Brian, 2, 2012-04-03
Jill, 3, 2013-11-01
Miley, 3, 2014-01-30
];

We can see that we have a list of employees with the grade that they are at and the date that they started at that grade. What would be good to be able to calculate is the end date for each of the grades (which would be today for the latest grades) so that we can match these employees to events that happened on particular dates (using IntervalMatch).

If we sort the data by employee and then the start date in the descending order, we can compare on each row if we are dealing with the same employee as on the previous row. If we are, we can calculate the end date from the previous date. If not, we just use today's date. Here is the code:

Employee:
Load
  Employee,
  Grade,
  StartDate,
  If(Previous(Employee)=Employee,
    Previous(StartDate)-1,
    Today()
  ) As EndDate
Resident
  Emps
Order By Employee, StartDateDesc;

Drop Table Emps;

Peek

Peek is the opposite of the Previous function in that Peek will look at data that has been loaded into memory rather than data that is being loaded from a source. From that point of view, it is always available because the data is just there, whereas the Previous function can only operate during a load. This makes Peek very versatile for accessing data from the in-memory tables.

Peek takes up to three parameters:

Parameter

Description

Field name

The name of the field that you want to retrieve the value from. It is passed as text literal, that is, in single quotes.

Row index

The row of the table from which you want to retrieve the field value. This index starts at 0 for row 1 (just to confuse us) and you can also pass a value of -1, which is the default, to retrieve the value from the last row loaded.

Table name

The name of the data table from which you want to retrieve the value. It is passed as a text literal.

If Peek is used with just the field name, then the most recently loaded value into that field, into whatever table, will be returned. If the row index is passed, then you must also pass the table name, as it doesn't make sense without it.

As an example, let's use a loop to cycle through all the records in a table, extract the field values, and display them using a trace:

For i=0 to NoOfRows('Employee')-1

  Let vEmp=Peek('Employee', $(i), 'Employee');
  Let vGrade=Peek('Grade', $(i), 'Employee');
  Let vDate=Date(Peek('startDate', $(i), 'Employee'), 'M/D/YYYY');
  Trace Employee, $(vEmp), started Grade $(vGrade) on $(vDate);
  
Next

As a more advanced example of using Peek, let's imagine that we had a sale file output from an ERP system that contained both header and line information in the one file. Here is an example:

201A0000120140801
202PR0001000005000366
202PR0002000011001954
202PR0003000017000323
202PR0004000001009999
202PR0005000008003287
201A0000220140802
202PR0001000003000360
202PR0002000111000999

Lines beginning with 201 are the order header row. They contain the customer number and the order date. Lines beginning with 202 are order lines and they contain a product code, quantity, and price per unit.

Obviously, we might imagine that we could deal with this using Table Files Wizard as it is a fixed width record. However, the problem here is that there are different width values on different lines. This is a perfect place to use Peek! Let's have a look at how we build the code for this.

It can be useful to use the wizard to help us get started, especially if there are many fields. In fact, we can run it twice to help build up the script that we need:

LOAD @1:3 as LineType, 
     @4:9 as CustomerCode, 
     @10:n as OrderDate
     ...

The following script has to be run as well:

LOAD @1:3 AsLineType, 
     @4:9 AsProductCode, 
     @10:15 As Quantity, 
     @16:n/100 As UnitPrice
     ...

Now, we can combine these. We will use Peek to move the CustomerCode and OrderDate values onto the order line rows:

SalesFile_Temp:
LOAD
   @1:3 AsLineType, 
  If(@1:3=201, 
    @4:9,
    Peek('CustomerCode')
    ) As CustomerCode,
  If(@1:3=201,
    Date#(@10:n,'YYYYMMDD'),
    Peek('OrderDate')
    ) As OrderDate,
If(@1:3=202,@4:9,Null()) As ProductCode, 
If(@1:3=202,@10:15,Null()) As Quantity, 
If(@1:3=202,@16:n/100,Null()) As UnitPrice
FROM
[..SourcesSalesFile.txt]
(fix, codepage is 1252);

Now, the table will contain a mix of row types, but we only need the ones that are type 202, because they have all the good data now:

SalesFile:
Load
  CustomerCode,
  OrderDate,
  ProductCode,
  Quantity,
  UnitPrice,
  Quantity*UnitPrice as LineValue
Resident
  SalesFile_Temp
Where LineType=202;

Drop Table SalesFile_Temp;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset