Chapter 2. Sales Perspective

The success of all businesses is at some point determined by how well they can sell their products and/or services. The large amount of time and money that companies spend on software that facilitates the sales process is testament to its importance. Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and Point of Sales (PoS) software not only ease the sales process, but also gather a large amount of sales-related data. Therefore, it is not uncommon that a company's first QlikView application is designed to explore and discover sales data.

Before we begin to create data visualization and analysis for our sales perspective, let's review the data model that supports it. In the process, we will resolve data quality issues that can either distract users' attention away from a visualization's data or distort how they interpret it. Next, we'll introduce two common sales department user stories and build solutions to stratify customers and analyze customer churn. Finally, let's take our first look at QlikView extensions and overall application design.

In this chapter, let's review the following topics:

  • The data model for the sales perspective
  • Common data quality issues
  • Customer stratification and churn analysis
  • QlikView extensions and the cycle plot chart
  • QlikView design templates

Let's get started and review the data model that we will use to create our sales perspective in QlikView.

Sales perspective data model

Our company sells gadgets to customers throughout the United States and our sales perspective data model is based on data from an ERP system. The following figure shows the data model that we are going to work with throughout this chapter:

Sales perspective data model

Exercise 2.1

With the following steps, let's migrate the sales perspective container from the book's exercise files to where we've installed QDF on our computers and start to explore the data together:

  1. In the Ch. 2 folder of the book's exercise files, copy the container called 1001.Sales_Perspective to the QDF folder that is located on your computer. By default, the QDF folder will be C:QlikSourceDate.
  2. In the QDF folder, open the VariableEditor Shortcut in the 0.Administration container.
  3. Click Container Map Editor. If the button hangs, then enable the Open Databases in Read and Write mode in the Setting tab of the Edit Script window and try again.
  4. In the container map table, go to the empty line after 99.Shared_Folders, and under the Container Folder Name column, click the black arrow indicating that it is an input field.
  5. Enter the name of the new container that we just copied, 1001.Sales_Perspective, into the input field.
  6. Continue along the row and enter the Variable Prefix as Sales and the Container Comments as Container for Sales Perspective.
  7. Click the Update Map and create Containers button that is located in the top-left of the container map table, and when prompted, click Update Container Map.
  8. Save the QlikView file.

Now that we've finished migrating the container to our local QDF, let's open Sales_Perspective_Sandbox.qvw in the 1.Application folder of the 1001.Sales_Perspective container and explore the sales data in more detail.

The data model that we are using is a star schema and it includes a set of events common to many companies. In the fact table at the center of the model, we store the following events:

  • Sales invoices
  • Sales credit memos
  • Sales budget

Tip

The sales budget may not come from our ERP. It may exist in Excel or in the database of a specific-planning software.

Sales invoices are the principal event of the data model. We don't use the general journal entries that the sales invoices often generate in an ERP system because it does not have the level of detail that a sales invoice does. For example, product details are often not included in the general journal entry.

However, it is important that the total sales amount from our sales invoices matches the total sales that we have in our financial reports. For that reason, it is important to consider any sales cancelation or other sales adjustment. In this data model, sales credit memos properly adjust our total sales amount to match the financial reports that we will see in Chapter 3, Financial Perspective.

Finally, we cannot analyze or judge our sales performance without comparing it with something. Basic sales analysis involves comparing current sales with either historical or planned sales. Therefore, we should aim to have at least two years of sales data or the sales budget in our data model. In this data model, we have both historical sales and planned sales data.

Tip

Planned sales can be either a sales budget, a sales forecast, or both.

All of these events are discrete events. In other words, they only exist at a discrete point in time. The fact table that stores discrete events is called a transactional fact table. The date dimension in a transactional fact table holds the date when the event occurred.

Along with the date dimension, we use the 7Ws (who, what, where, when, how many, why, and how) in the following table to describe an example set of metrics and dimensions that we expect to find in a sales perspective data model:

Dimensions

7Ws

Fields

Comments

Who

Customer

Sometimes, customers are only identifiable by the sales ticket number from a POS system. Otherwise, we hope to have a rich set of attributes that describe our customers as in the case of our data model.

Who

Sales Person

In our data model, the sales person is defined at the invoice level. This might also be an attribute of a customer, a product, or an office. We also should include any sales structure hierarchy if it exists.

What

Item

Whether it be a product or a service, we should describe what we sell to a customer in a detailed dimension table.

Where

Billing Address,

Shipping Address

The location can either be related to the customer, the sales office, or the store where the sale took place.

When

Date

Here, we record the exact date of the sales invoices and credit memos. We don't usually make daily sales budgets, so we assign our monthly budget to the first day of the month.

Why

Promotion Description

Giving a possible reason for sales variation versus historical or planned sales is a part of the analytical process. Therefore, we should include any element that is intended to cause variation, such as sales offers and promotions, into the data model.

How

_OnlineOrderFlag

We should also include whether we sell our products face to face, online, telephonically, or through any other sales channel.

Metrics

7Ws

Fields

Comments

How many

Net Sales

The net sales field records an invoice's sales dollar amount after discount. It also stores the net sales budget so we use _ActualFlag or _BudgetFlag fields to determine whether the amount is actual or budget.

How many

Quantity

Sales quantity helps us understand sales in a manner that is independent of any change to the sales price. Quantity can be based on different units of measurement. For example, we can measure hours, kilograms, or pieces.

How many

Gross Profit

Although gross profit is not always easy to calculate and might not be available, it is vital to understand the effectiveness of our sales. Like net sales. The amount can also be actual or budget.

For more information on data modeling, read Data Warehouse Toolkit by Ralph Kimball, and Agile Data Warehouse Design by Lawrence Corr.

Data quality issues

Great data visualization and analysis starts with having a well-built data model that contains high-quality data. If this is our first data exploration and discovery project, one of the most important discoveries that we are going to make is that our data contains a great deal of garbage. One of the most noticeable data-quality issues is the absence of a value in a field.

For example, in Sales_Perspective_Sandbox.qvw, the Vendor attribute in the Items table does not always have a value. The absence of a value in a field is referred to as a null value. In QlikView, a user can't select a null value. However, we often want to select null values to know which items have missing attributes and send that list of items to whomever is responsible for the catalog's data quality.

In order to select item's with missing vendor information, we replace all the null values in the Vendor field with the string N/A, by inserting the following code before we load the Items table in order to replace all null value in the load script:

MappingNULL_NA:
Mapping
LOAD NULL() as NULL,
  'N/A' as Mapped_Value
AutoGenerate (1);
MAP Vendor USING MappingNULL_NA;

Tip

Although we have the option to suppress null values in the Dimensions tab of a QlikView object, we never use this option unless we understand why the dimension values are null. These null values may indicate a larger problem with our data or the data model.

Missing dimension values

The previous mapping will not get rid of all the null values that we see in our charts because what we perceive in QlikView to be a null value may in fact be a missing value. Unlike missing values, null values can be observed the in the table where they reside. For example, can go to the Table Viewer, preview the Items table, and see the null values in the Vendor field.

However, what if the fact table contains an item key that refers to an item that does not exist in the Items table? Or, what if the fact table is missing the item key for some transactions? Despite running our previous null value mapping, we will still see Vendor as null in QlikView because the item key that the fact table refers to does exist in the Items table. It is a missing value.

The way to give users the ability to select missing items values to replace incorrect and null item keys in the fact table with a key to a fictitious item. The key to the fictitious item is defined as negative one (-1). Our first step to replace incorrect and null item keys is to create a mapping table using the Items table where we map all the existing item keys with their own values:

MappingMissingIncorrectItemsKeys:
Mapping
LOAD _KEY_ItemID,
  _KEY_ItemID
FROM
$(vG.QVDPath)2.TransformItems.qvd
(qvd); 

The second step is to save the original value stored in _Key_ItemID in another field and apply this map to the _Key_ItemID field when we load the Facts table:

Facts:
LOAD [Document ID],
_KEY_ItemID as Original_ItemID,
     applymap('MappingMissingIncorrectItemsKeys',_KEY_ItemID,-1) as _KEY_ItemID,
     _KEY_Date,
     ...
FROM
$(vG.QVDPath)2.TransformFacts.qvd
(qvd);

Our final step is to create a fictitious item called 'Missing' with an item key of negative one (-1) in the Items table:

Concatenate (Items)
LOAD -1 as _KEY_ItemID,
     'Missing' as [Item ID],
     'Missing' as Item,
     'Missing' as [Item Source],
     'Missing' as [Item Group],
     ...
AutoGenerate (1);

Missing fact values

After the previous two adjustments, we will still encounter some missing values in QlikView. For example, do you notice anything missing from the following chart that shows the monthly net sales for the item Bamdax 126 in Sales_Perspective_Sandbox.qvw.?

Missing fact values

If you noticed that various months do not appear on the horizontal axis, then you are correct. As Bamdax 126 is not sold during every month, there is no relation between Bamdax 126 and the months when the item was not sold. The values are missing, and these missing values distort the line chart.

In order to completely resolve this issue, we would have to complement the fact table with the Cartesian product of any or all dimension key sets, and in effect, measure nil events. However, we should take into account that this may cause a severe degradation of our QlikView application's performance. Therefore, we should apply this solution pragmatically to solve specific analytical needs.

In this case, we specifically want to see a more accurate net sales trend for Bamdax 126 that includes the months that we did not sell the item. We do this by adding the following code to our load script after loading the Facts table. The code creates a Cartesian product of the Product and Date dimension key sets and adds it to our Facts table:

Missing_Facts_Tmp:
Load distinct makedate(Year(_KEY_Date),Month(_KEY_Date)) as _KEY_Date,
  1 as _ActualFlag
Resident Facts;
Left Join (Missing_Facts_Tmp)
Load distinct _KEY_ItemID
FROM
$(vG.QVDPath)2.TransformItems.qvd(qvd);

Concatenate (Facts)
Load *
Resident Missing_Facts_Tmp;

DROP Table Missing_Facts_Tmp;

Tip

In order to reduce the number of rows in the Cartesian product we only use the month and year of the date. We could have optimized it further using the exists() function to concatenate the dimension combinations that don't already exist in the Facts.

Finally, we untick the Suppress Zero-Values checkbox in the Presentation tab of the line chart in order to see the correct net sales trend for Bamdax 126. You will notice that the following line chart shows that Bamdax 126 is purchased almost every two months. It is difficult to make this observation in the previous chart.

Tip

Again, be very careful when creating a Cartesian product in QlikView. We create a Cartesian product by joining two or more tables that do not have a field in common. If the tables are large, then this may cause QlikView to use all the available RAM memory and freeze the computer.

Missing fact values

These steps to eliminate null and missing values in the data model will help improve our data analysis and visualization. However, we will most likely not use all the fields in the data model, so we shouldn't waste time to clean every field or create every missing value until they've proven their business value.

Data formatting and standardization

While QlikView is not data-cleansing software, it does allow us to implement some formatting and standardization that makes it easier to visualize data. We perform these actions in the data model load script as best practice. However, we can also use the same QlikView functions directly in any QlikView object.

Case

We read by identifying the overall shape of words. If we use text values with all uppercase letters, then all the words have the same block shape. Which makes words harder to identify and reduces readability. Also, all uppercase text values tend to be less aesthetically appealing.

Tip

A quick search in Google reveals that some people have begun to challenge this belief. Hopefully, future scientific studies will soon allow us to make the best decision and confirm how to optimize text readability.

An even worse scenario is when a field has some text values in all uppercase and others in lowercase. This is common when we integrate two data sources, and it is an unnecessary distraction when we visualize data.

First, we use the capitalize() function when the field is a proper noun, such as customer name, employee name, or city. The function will return a mixed-case text value with the first letter of every word being a capital letter. Secondly, we use the upper() function to standardize text fields that are abbreviations, such as state or units of measurement. Lastly, we use the lower() function to standardize all other text fields.

Tip

This solution is not perfect for some text values, such as a street address that contains both proper nouns and abbreviations. For example, Cedar St. NW requires a more nuanced approach. However, a street address is rarely used for analysis, and any extra effort to standardize this or any other field should be weighed against its business value.

Unwanted characters

Text values with strange characters can also be an unnecessary distraction. Characters, such as a number sign (#), an exclamation mark (!), a vertical bar (|), and so on, can sometimes find their way into text descriptions where they don't belong. We can eliminate them with the purgechar() function or the replace() function.

Also, extra spaces between words in a dimension value can make our charts look sloppy. QlikView tends to eliminate leading and trailing spaces, but it doesn't eliminate extra spaces between words. We can accomplish this using the following expression, preferably in our load script:

replace(replace(replace(FieldName,' ','<>'),'><',''),'<>',' ')

Tip

Hopefully, in the future, regular expressions will be native to QlikView, and we will have a greater ability to clean and standardize data. Barry Harmsen has created custom script functions that allow us to use regular expressions in the load script (http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/). A third-party tool called QVSource also allows us to use regular expressions in the load script (http://wiki.qvsource.com/Using-Regular-Expressions-In-QlikView.ashx).

Dates and time

Finally, we make sure that all date fields have the same format. This is especially the case when we extract data from different data sources. We use the date() or time() function to change the format to the default date format that we define in the list of system variables at the beginning of the script.

When we create analysis that is intended for an international audience where some users use the MM/DD/YYYY format and others use the DD/MM/YYYY format, we should consider using the YYYY/MM/DD format. This format won't leave users guessing whether 11/1/2016 refers to November 1, 2016 or January 11, 2016.

Master calendar

Along with formatting field values, we also standardize the use of whole dimension in order to facilitate analysis of tables. Those that we reuse between different data models are called conformed dimensions. The date dimension is ubiquitous and serves as a great example to create the first conformed dimension.

The range of dates that we use in each data model may change, so instead of using the exact same table for each data model, we create a master calendar reusing the same script. We call these reusable scripts subroutines, and in QDF we store script subroutines in the following file path:

C:QlikSourceData99.Shared_Folders3.Include4.Sub

Although QDF has a master calendar subroutine, we will use the master calendar subroutine that is available from QlikView Components (http://qlikviewcomponents.org). Qlikview Components is a library of script subroutines and functions that were developed by Rob Wunderlich and Matt Fryer. We prefer this mastercalendar subroutine because it automatically creates several calendar-based set-analysis variables that we can use in our charts.

Tip

QDF is not the end but rather the means. It is designed to be flexible so that we can adapt it to our needs. We can create, import, and modify any reusable component that best fits our business requirements.

We can download the latest release of QlikView Components from GitHub (https://github.com/RobWunderlich/Qlikview-Components/releases). We then integrate it with our QDF by copying the Qvc.qvs file that is found under the Qvc_Runtime folder to C:QlikSourceData99.Shared_Folders3.Include4.Sub. We choose to save it to 99.Shared_Folders so that we can use these subroutines and functions in every container that we create.

In our load script, we add the following code after initializing QDF:

$(Include=$(vG.SharedSubPath)Qvc.qvs);

We then add the following code to create the master calendar and the calendar-based set-analysis variables:

SET Qvc.Calendar.v.CreateSetVariables = 1;
call Qvc.CalendarFromField('_KEY_Date');

Tip

Every QlikView Components release contains working examples of all its subroutines. We can use these examples to learn the possible parameters and results of each subroutine.

We finish the load script by running a subroutine that eliminates any temporary variables that were used to create the master calendar:

CALL Qvc.Cleanup;

After running our load script, we now have the following master calendar:

Master calendar

Most of these columns look familiar. However, the columns that end with Serial may be new to you. To those of us who have battled with defining date ranges with set analysis, the Serial columns help make this an easier task.

For example, we can calculate year-to-date (YTD) sales easily with the following expression:

sum({$<Year={$(=max(Year))},Month=,_DateSerial={"<=$(=max(_DateSerial))"},_ActualFlag={1}>}[Net Sales])

However, instead of repeating this set analysis in every chart, we can use the calendar-based set-analysis variables to calculate YTD sales. We can improve the preceding expression using the set-analysis variable called vSetYTDModifier:

sum({$<$(vSetYTDModifier),_ActualFlag={1}>} [Net Sales]) 

We can review all of the available calendar-based set-analysis variables in Settings | Variable Overview.

Now that we've reviewed the sales perspective data model and methods in the load script make it support cleaner data visualization and analysis, let's look at our first user story.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset