Sometimes, we need to consider data from different directions. In this section, we will examine some advanced techniques for data manipulation that will make our lives easier.
We will often come across situations where we need to consider the earliest values in a dataset, either just the first number of rows or the first value in an ordered set. QlikView has functions for this.
The First
statement will precede a load and states the number of records that should be loaded:
First 10 Load * From Data.qvd (qvd);
Sometimes, if we want to just get a fixed set of rows from a file, we can use the Header option in Table Files Wizard to remove any rows preceding the rows we want and then use the First
statement to grab the lines that we do want. This can be an effective strategy where there are several datasets in one worksheet in Excel.
FirstSortedValue
is a very advanced function that can be used both in the script and in charts. We use it to obtain the top value in a sorted list of values.
As an example, say that we want to retrieve ProductID
with the highest sales value for each order, the following code can be used:
TopProd: Load OrderID, FirstSortedValue(ProductID, -LineValue) As TopProduct Resident Fact Group by OrderID;
We retrieve the top product based on the order of the LineValue
field. The minus sign preceding the field name indicates that this is a descending sort, so the first product should correspond to the highest value.
We can pass some other parameters. For example, if more than one product had the same value, then the default option is to return null
. If we specify Distinct
before ProductID
, then the duplicate situation will be ignored.
We can also pass a third parameter after the sort weight to, say, get the second or third or nth sorted value instead of the first.
When loading data, it can be a very neat trick to look at data that we have loaded before. There are two great functions that we can use to do this.
The Previous
function can look at a value in the previous input row. This can be really useful. In fact, we can nest multiple previous statements together to look even further back!
Mostly, the function will be combined with an Order
By
clause. This means that we can have some kind of expectation of what the previous record held, and therefore, test for that.
As an example, let's look at this simple dataset:
Emps: Load * Inline [ Employee, Grade, StartDate Brian, 1, 2010-01-04 Jill, 1, 2011-07-19 Graham, 3, 2010-02-02 Miley, 2, 2011-08-20 Brian, 2, 2012-04-03 Jill, 3, 2013-11-01 Miley, 3, 2014-01-30 ];
We can see that we have a list of employees with the grade that they are at and the date that they started at that grade. What would be good to be able to calculate is the end date for each of the grades (which would be today for the latest grades) so that we can match these employees to events that happened on particular dates (using IntervalMatch
).
If we sort the data by employee and then the start date in the descending order, we can compare on each row if we are dealing with the same employee as on the previous row. If we are, we can calculate the end date from the previous date. If not, we just use today's date. Here is the code:
Employee: Load Employee, Grade, StartDate, If(Previous(Employee)=Employee, Previous(StartDate)-1, Today() ) As EndDate Resident Emps Order By Employee, StartDateDesc; Drop Table Emps;
Peek
is the opposite of the Previous
function in that Peek
will look at data that has been loaded into memory rather than data that is being loaded from a source. From that point of view, it is always available because the data is just there, whereas the Previous
function can only operate during a load. This makes Peek
very versatile for accessing data from the in-memory tables.
Peek
takes up to three parameters:
If Peek
is used with just the field name, then the most recently loaded value into that field, into whatever table, will be returned. If the row index is passed, then you must also pass the table name, as it doesn't make sense without it.
As an example, let's use a loop to cycle through all the records in a table, extract the field values, and display them using a trace:
For i=0 to NoOfRows('Employee')-1 Let vEmp=Peek('Employee', $(i), 'Employee'); Let vGrade=Peek('Grade', $(i), 'Employee'); Let vDate=Date(Peek('startDate', $(i), 'Employee'), 'M/D/YYYY'); Trace Employee, $(vEmp), started Grade $(vGrade) on $(vDate); Next
As a more advanced example of using Peek
, let's imagine that we had a sale file output from an ERP system that contained both header and line information in the one file. Here is an example:
201A0000120140801 202PR0001000005000366 202PR0002000011001954 202PR0003000017000323 202PR0004000001009999 202PR0005000008003287 201A0000220140802 202PR0001000003000360 202PR0002000111000999
Lines beginning with 201
are the order header row. They contain the customer number and the order date. Lines beginning with 202
are order lines and they contain a product code, quantity, and price per unit.
Obviously, we might imagine that we could deal with this using Table Files Wizard as it is a fixed width record. However, the problem here is that there are different width values on different lines. This is a perfect place to use Peek
! Let's have a look at how we build the code for this.
It can be useful to use the wizard to help us get started, especially if there are many fields. In fact, we can run it twice to help build up the script that we need:
LOAD @1:3 as LineType, @4:9 as CustomerCode, @10:n as OrderDate ...
The following script has to be run as well:
LOAD @1:3 AsLineType, @4:9 AsProductCode, @10:15 As Quantity, @16:n/100 As UnitPrice ...
Now, we can combine these. We will use Peek
to move the CustomerCode
and OrderDate
values onto the order line rows:
SalesFile_Temp: LOAD @1:3 AsLineType, If(@1:3=201, @4:9, Peek('CustomerCode') ) As CustomerCode, If(@1:3=201, Date#(@10:n,'YYYYMMDD'), Peek('OrderDate') ) As OrderDate, If(@1:3=202,@4:9,Null()) As ProductCode, If(@1:3=202,@10:15,Null()) As Quantity, If(@1:3=202,@16:n/100,Null()) As UnitPrice FROM [..SourcesSalesFile.txt] (fix, codepage is 1252);
Now, the table will contain a mix of row types, but we only need the ones that are type 202
, because they have all the good data now:
SalesFile: Load CustomerCode, OrderDate, ProductCode, Quantity, UnitPrice, Quantity*UnitPrice as LineValue Resident SalesFile_Temp Where LineType=202; Drop Table SalesFile_Temp;