The success of all businesses is at some point determined by how well they can sell their products and/or services. The large amount of time and money that companies spend on software that facilitates the sales process is testament to its importance. Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and Point of Sales (PoS) software not only ease the sales process, but also gather a large amount of sales-related data. Therefore, it is not uncommon that a company's first QlikView application is designed to explore and discover sales data.
Before we begin to create data visualization and analysis for our sales perspective, let's review the data model that supports it. In the process, we will resolve data quality issues that can either distract users' attention away from a visualization's data or distort how they interpret it. Next, we'll introduce two common sales department user stories and build solutions to stratify customers and analyze customer churn. Finally, let's take our first look at QlikView extensions and overall application design.
In this chapter, let's review the following topics:
Let's get started and review the data model that we will use to create our sales perspective in QlikView.
Our company sells gadgets to customers throughout the United States and our sales perspective data model is based on data from an ERP system. The following figure shows the data model that we are going to work with throughout this chapter:
With the following steps, let's migrate the sales perspective container from the book's exercise files to where we've installed QDF on our computers and start to explore the data together:
Ch. 2
folder of the book's exercise files, copy the container called 1001.Sales_Perspective
to the QDF
folder that is located on your computer. By default, the QDF
folder will be C:QlikSourceDate
.QDF
folder, open the VariableEditor Shortcut
in the 0.Administration
container.Container Map Editor
. If the button hangs, then enable the Open Databases in Read and Write mode in the Setting tab of the Edit Script window and try again.99.Shared_Folders
, and under the Container Folder Name
column, click the black arrow indicating that it is an input field.1001.Sales_Perspective
, into the input field.Variable Prefix
as Sales
and the Container Comments
as Container for Sales Perspective
.Now that we've finished migrating the container to our local QDF, let's open Sales_Perspective_Sandbox.qvw
in the 1.Application
folder of the 1001.Sales_Perspective
container and explore the sales data in more detail.
The data model that we are using is a star schema and it includes a set of events common to many companies. In the fact table at the center of the model, we store the following events:
Sales invoices are the principal event of the data model. We don't use the general journal entries that the sales invoices often generate in an ERP system because it does not have the level of detail that a sales invoice does. For example, product details are often not included in the general journal entry.
However, it is important that the total sales amount from our sales invoices matches the total sales that we have in our financial reports. For that reason, it is important to consider any sales cancelation or other sales adjustment. In this data model, sales credit memos properly adjust our total sales amount to match the financial reports that we will see in Chapter 3, Financial Perspective.
Finally, we cannot analyze or judge our sales performance without comparing it with something. Basic sales analysis involves comparing current sales with either historical or planned sales. Therefore, we should aim to have at least two years of sales data or the sales budget in our data model. In this data model, we have both historical sales and planned sales data.
All of these events are discrete events. In other words, they only exist at a discrete point in time. The fact table that stores discrete events is called a transactional fact table. The date dimension in a transactional fact table holds the date when the event occurred.
Along with the date dimension, we use the 7Ws (who, what, where, when, how many, why, and how) in the following table to describe an example set of metrics and dimensions that we expect to find in a sales perspective data model:
For more information on data modeling, read Data Warehouse Toolkit by Ralph Kimball, and Agile Data Warehouse Design by Lawrence Corr.
Great data visualization and analysis starts with having a well-built data model that contains high-quality data. If this is our first data exploration and discovery project, one of the most important discoveries that we are going to make is that our data contains a great deal of garbage. One of the most noticeable data-quality issues is the absence of a value in a field.
For example, in Sales_Perspective_Sandbox.qvw, the Vendor
attribute in the Items
table does not always have a value. The absence of a value in a field is referred to as a null value. In QlikView, a user can't select a null value. However, we often want to select null values to know which items have missing attributes and send that list of items to whomever is responsible for the catalog's data quality.
In order to select item's with missing vendor information, we replace all the null values in the Vendor
field with the string N/A
, by inserting the following code before we load the Items
table in order to replace all null value in the load script:
MappingNULL_NA: Mapping LOAD NULL() as NULL, 'N/A' as Mapped_Value AutoGenerate (1); MAP Vendor USING MappingNULL_NA;
The previous mapping will not get rid of all the null values that we see in our charts because what we perceive in QlikView to be a null value may in fact be a missing value. Unlike missing values, null values can be observed the in the table where they reside. For example, can go to the Table Viewer, preview the Items
table, and see the null values in the Vendor
field.
However, what if the fact table contains an item key that refers to an item that does not exist in the Items
table? Or, what if the fact table is missing the item key for some transactions? Despite running our previous null value mapping, we will still see Vendor
as null in QlikView because the item key that the fact table refers to does exist in the Items
table. It is a missing value.
The way to give users the ability to select missing items values to replace incorrect and null item keys in the fact table with a key to a fictitious item. The key to the fictitious item is defined as negative one (-1
). Our first step to replace incorrect and null item keys is to create a mapping table using the Items
table where we map all the existing item keys with their own values:
MappingMissingIncorrectItemsKeys: Mapping LOAD _KEY_ItemID, _KEY_ItemID FROM $(vG.QVDPath)2.TransformItems.qvd (qvd);
The second step is to save the original value stored in _Key_ItemID
in another field and apply this map to the _Key_ItemID
field when we load the Facts
table:
Facts: LOAD [Document ID], _KEY_ItemID as Original_ItemID, applymap('MappingMissingIncorrectItemsKeys',_KEY_ItemID,-1) as _KEY_ItemID, _KEY_Date, ... FROM $(vG.QVDPath)2.TransformFacts.qvd (qvd);
Our final step is to create a fictitious item called 'Missing'
with an item key of negative one (-1
) in the Items
table:
Concatenate (Items) LOAD -1 as _KEY_ItemID, 'Missing' as [Item ID], 'Missing' as Item, 'Missing' as [Item Source], 'Missing' as [Item Group], ... AutoGenerate (1);
After the previous two adjustments, we will still encounter some missing values in QlikView. For example, do you notice anything missing from the following chart that shows the monthly net sales for the item Bamdax 126 in Sales_Perspective_Sandbox.qvw.
?
If you noticed that various months do not appear on the horizontal axis, then you are correct. As Bamdax 126 is not sold during every month, there is no relation between Bamdax 126 and the months when the item was not sold. The values are missing, and these missing values distort the line chart.
In order to completely resolve this issue, we would have to complement the fact table with the Cartesian product of any or all dimension key sets, and in effect, measure nil events. However, we should take into account that this may cause a severe degradation of our QlikView application's performance. Therefore, we should apply this solution pragmatically to solve specific analytical needs.
In this case, we specifically want to see a more accurate net sales trend for Bamdax 126 that includes the months that we did not sell the item. We do this by adding the following code to our load script after loading the Facts
table. The code creates a Cartesian product of the Product
and Date
dimension key sets and adds it to our Facts
table:
Missing_Facts_Tmp: Load distinct makedate(Year(_KEY_Date),Month(_KEY_Date)) as _KEY_Date, 1 as _ActualFlag Resident Facts; Left Join (Missing_Facts_Tmp) Load distinct _KEY_ItemID FROM $(vG.QVDPath)2.TransformItems.qvd(qvd); Concatenate (Facts) Load * Resident Missing_Facts_Tmp; DROP Table Missing_Facts_Tmp;
Finally, we untick the Suppress Zero-Values checkbox in the Presentation tab of the line chart in order to see the correct net sales trend for Bamdax 126. You will notice that the following line chart shows that Bamdax 126 is purchased almost every two months. It is difficult to make this observation in the previous chart.
These steps to eliminate null and missing values in the data model will help improve our data analysis and visualization. However, we will most likely not use all the fields in the data model, so we shouldn't waste time to clean every field or create every missing value until they've proven their business value.
While QlikView is not data-cleansing software, it does allow us to implement some formatting and standardization that makes it easier to visualize data. We perform these actions in the data model load script as best practice. However, we can also use the same QlikView functions directly in any QlikView object.
We read by identifying the overall shape of words. If we use text values with all uppercase letters, then all the words have the same block shape. Which makes words harder to identify and reduces readability. Also, all uppercase text values tend to be less aesthetically appealing.
An even worse scenario is when a field has some text values in all uppercase and others in lowercase. This is common when we integrate two data sources, and it is an unnecessary distraction when we visualize data.
First, we use the capitalize()
function when the field is a proper noun, such as customer name, employee name, or city. The function will return a mixed-case text value with the first letter of every word being a capital letter. Secondly, we use the upper()
function to standardize text fields that are abbreviations, such as state or units of measurement. Lastly, we use the lower()
function to standardize all other text fields.
This solution is not perfect for some text values, such as a street address that contains both proper nouns and abbreviations. For example, Cedar St. NW requires a more nuanced approach. However, a street address is rarely used for analysis, and any extra effort to standardize this or any other field should be weighed against its business value.
Text values with strange characters can also be an unnecessary distraction. Characters, such as a number sign (#), an exclamation mark (!), a vertical bar (|), and so on, can sometimes find their way into text descriptions where they don't belong. We can eliminate them with the purgechar()
function or the replace()
function.
Also, extra spaces between words in a dimension value can make our charts look sloppy. QlikView tends to eliminate leading and trailing spaces, but it doesn't eliminate extra spaces between words. We can accomplish this using the following expression, preferably in our load
script:
replace(replace(replace(FieldName,' ','<>'),'><',''),'<>',' ')
Hopefully, in the future, regular expressions will be native to QlikView, and we will have a greater ability to clean and standardize data. Barry Harmsen has created custom script functions that allow us to use regular expressions in the load script (http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/). A third-party tool called QVSource also allows us to use regular expressions in the load script (http://wiki.qvsource.com/Using-Regular-Expressions-In-QlikView.ashx).
Finally, we make sure that all date fields have the same format. This is especially the case when we extract data from different data sources. We use the date()
or time()
function to change the format to the default date format that we define in the list of system variables at the beginning of the script.
When we create analysis that is intended for an international audience where some users use the MM/DD/YYYY format and others use the DD/MM/YYYY format, we should consider using the YYYY/MM/DD format. This format won't leave users guessing whether 11/1/2016 refers to November 1, 2016 or January 11, 2016.
Along with formatting field values, we also standardize the use of whole dimension in order to facilitate analysis of tables. Those that we reuse between different data models are called conformed dimensions. The date dimension is ubiquitous and serves as a great example to create the first conformed dimension.
The range of dates that we use in each data model may change, so instead of using the exact same table for each data model, we create a master calendar reusing the same script. We call these reusable scripts subroutines, and in QDF we store script subroutines in the following file path:
C:QlikSourceData99.Shared_Folders3.Include4.Sub
Although QDF has a master calendar subroutine, we will use the master calendar subroutine that is available from QlikView Components (http://qlikviewcomponents.org). Qlikview Components is a library of script subroutines and functions that were developed by Rob Wunderlich and Matt Fryer. We prefer this mastercalendar subroutine because it automatically creates several calendar-based set-analysis variables that we can use in our charts.
We can download the latest release of QlikView Components from GitHub (https://github.com/RobWunderlich/Qlikview-Components/releases). We then integrate it with our QDF by copying the Qvc.qvs
file that is found under the Qvc_Runtime
folder to C:QlikSourceData99.Shared_Folders3.Include4.Sub
. We choose to save it to 99.Shared_Folders
so that we can use these subroutines and functions in every container that we create.
In our load
script, we add the following code after initializing QDF:
$(Include=$(vG.SharedSubPath)Qvc.qvs);
We then add the following code to create the master calendar and the calendar-based set-analysis variables:
SET Qvc.Calendar.v.CreateSetVariables = 1; call Qvc.CalendarFromField('_KEY_Date');
We finish the load script by running a subroutine that eliminates any temporary variables that were used to create the master calendar:
CALL Qvc.Cleanup;
After running our load
script, we now have the following master calendar:
Most of these columns look familiar. However, the columns that end with Serial
may be new to you. To those of us who have battled with defining date ranges with set analysis, the Serial
columns help make this an easier task.
For example, we can calculate year-to-date (YTD) sales easily with the following expression:
sum({$<Year={$(=max(Year))},Month=,_DateSerial={"<=$(=max(_DateSerial))"},_ActualFlag={1}>}[Net Sales])
However, instead of repeating this set analysis in every chart, we can use the calendar-based set-analysis variables to calculate YTD sales. We can improve the preceding expression using the set-analysis variable called vSetYTDModifier
:
sum({$<$(vSetYTDModifier),_ActualFlag={1}>} [Net Sales])
We can review all of the available calendar-based set-analysis variables in Settings | Variable Overview.
Now that we've reviewed the sales perspective data model and methods in the load script make it support cleaner data visualization and analysis, let's look at our first user story.