Customer churn

Customer churn is a measure of the company's tendency to lose customers. Our user story speaks of the need to detect at-risk customers and prevent them from becoming a lost customer.

Note

As a sales representative, I want to see which customers I'm at risk of losing so that I can take action today to prevent their leaving.

Surely, there are many variables that we may use to predict customer churn. In this case we expect customers to consistently make a purchase every so many days, so we will use a variable called customer purchase frequency to detect those that we are at risk of losing.

We could calculate the average number of days between purchases and warn sales representatives when the number of days since a customer's last purchase exceeds that average.

However, a simple average may not always be an accurate measure of a customer's true purchasing behavior. If we assume that their purchase frequency is normally distributed then we use the t-test to determine within what range the average is likely to fall. Moreover, we prefer the t-test because it can be used for customers that have made less than thirty or so purchases.

If we want our model to be sensitive to customer inactivity then we send an alert when the days since their last purchase exceeds the average's lower limit. Otherwise, if we don't want to overwhelm the sales representatives with alerts then we use the average's upper limit to determine whether we are at risk of losing a customer. We'll apply the later case in the following example.

Before we calculate the upper limit of a t-distribution, we need to add a table to the data model that contains the number of days that elapse between field the purchases each customer makes. We add the Customer Purchase Frequency with the following code that we add to the load script after having loaded the Facts table:

[Customer Purchase Frequency Tmp]:
Load distinct _KEY_Date as [Customer Purchase Date], 
     _KEY_Customer
Resident Facts
Where _ActualFlag = 1
  and [Net Sales] > 0;

[Customer Purchase Frequency]:
Load [Customer Purchase Date],
   _KEY_Customer,
     if(_KEY_Customer <> peek(_KEY_Customer),0,[Customer Purchase Date] - Peek([Customer Purchase Date])) as [Days Since Last Purchase]
Resident [Customer Purchase Frequency Tmp]
Order by _KEY_Customer,[Customer Purchase Date];
DROP Table [Customer Purchase Frequency Tmp]; 

The previous script will produce the following table:

Customer churn

This is a great opportunity to use a histogram to understand the distribution of a customer's purchasing frequency. We can also compare the distribution to a normal or a t-distributions in the same chart. Let's use the following properties to create our histogram:

Exercise 2.4

Chart Properties

Value

General / Chart Type

Choose to create a combo chart.

Dimensions /Used Dimensions

Use the following code to create a calculated dimension called Days Since Last Purchase:

=ValueLoop($(=min([Days Since Last Purchase])),$(=max([Days Since Last Purchase])),1) 

Expressions

Use the following code to create a expression called Number of Purchases:

sum(if([Days Since Last Purchase]=ValueLoop($(=min([Days Since Last Purchase])),$(=max([Days Since Last Purchase])),1),1))
/count([Days Since Last Purchase]) 

Choose to display the expression as a bar.

Expressions

Use the following code to create a expression called Normal Distribution:

NORMDIST(ValueLoop($(=min([Days Since Last Purchase])),$(=max([Days Since Last Purchase])),1)
,avg([Days Since Last Purchase]),stdev([Days Since Last Purchase]),0)

Choose to display the expression as a line.

Expressions

Use the following code to create a expression called t-Distribution:

TDIST((fabs(ValueLoop($(=min([Days Since Last Purchase])),$(=max([Days Since Last Purchase])),1) -avg([Days Since Last Purchase])))
 /
 (Stdev([Days Since Last Purchase]) / sqrt(count([Days Since Last Purchase]))) ,count([Days Since Last Purchase]),1)

Choose to display the expression as a smooth line.

Axes / Dimension Axis

Choose to Continuous option in the Dimension Axis section.

Presentation /Reference Lines

Use the following code to create a reference line called Mean Days Since Last Purchase:

=Avg([Days Since Last Purchase]) 

We set the following location:

Choose the option to Show Label in Chart and the option to locate it on the Continuous X axis.

Presentation /Reference Lines

Use the following code to create a reference line called Upper Limit (95%):

=TTest1_Upper([Days Since Last Purchase]-0,(1-(95)/100)/2) 

Choose the option to locate it on the Continuous X axis.

Presentation /Reference Lines

Use the following code to create a reference line called Lower Limit (95%):

=TTest1_Lower([Days Since Last Purchase]-0,(1-(95)/100)/2) 

Choose the option to locate it on the Continuous X axis.

After additional adjustments to the presentation, we have the following chart. This particular chart compares the actual purchasing frequency distribution for customer Gevee. with both a normal and a t-distribution curve:

Exercise 2.4

If we alert the sales representatives any time that a customer waits more than the mean number of days, then we could be sending too many false alarms, or in other words false positives. However, if we define at-risk customers as those who wait longer than the upper limit of the 95% confidence level, we have a higher probability of alerting the sales representative about customers that are really at-risk, or true positives.

Let's also keep in mind that not all lost customers have the same effect on the company, so let's combine the stratification that we performed earlier in the chapter with our churn-prediction analysis. In this way, sales representatives know to focus their attention on A customers that are at-risk, and not invest too much time to follow-up on D customers. The following table shows what this analysis may look like:

Exercise 2.4

We add the following expression to the customer-stratification table that we created in a previous exercise. The background color expression calculates the days since the last purchase and compares it with the upper limit of the 95% confidence level. Refer the following table for a clear view:

Exercise 2.5

Expressions

Expression for an at-risk customer

='' 

We set the Background Color as follows:

if(max({$<_ActualFlag={1},Year=,Month=,_DateSerial={"<=$(=max(_DateSerial))"}>} Total _KEY_Date) 
- max({$<_ActualFlag={1},Year=,Month=,_DateSerial={"<=$(=max(_DateSerial))"}>} _KEY_Date)>
TTest1_Upper({$<_ActualFlag={1},Year=,Month=,_DateSerial={"<=$(=max(_DateSerial))"}>} [Days Since Last Purchase]-0,(1-(95)/100)/2), red(100)) 

Customer stratification together with customer-churn prediction is a very powerful business tool. Now, let's take our first look at QlikView extensions and introduce the cycle plot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset