Business applications of predictive analysis

Forecasting is using data trends and cycles to predict future values. For example, claims, sales, and expenses data have seasonality and trends. When we count their cycles and trends, we can predict their future values reasonably well. There are two commonly used forecasting techniques: regression analysis and time series analysis. In terms of regression analysis, MicroStrategy supports linear regression, exponential regression, tree regression, and logistic regression. Logistic regression generally applies to predicting categorical target variables, while other regressions models generally apply to predicting numerical target variables.

In this section, we will demonstrate MicroStrategy's predictive analysis capabilities with several business applications. You will see that MicroStrategy has simplified the work to the extent that using different types of data mining model is simply a single mouse click away.

Forecasting quarterly revenues using linear regression and exponential regression with seasonality

Suppose we have a company's quarterly revenue data from 2012 Q1 to 2014 Q4. Based on these data, we want to forecast its quarterly revenues in 2015. Values for Revenue variable are given. Quarter Index, and Quarter of the Year metrics can be created:

The Quarter Index metric is defined as:

((Max(Quarter@ID) {~} - (10 * Max(Year@ID)) {~} ) + ((Max(Year@ID) {~} - 2012) * 4))

The Quarter of the Year metric is defined as:

(Max(Quarter@ID) {~} - (10 * Max(Year@ID) {~} ))

Forecasting quarterly revenues using linear regression and exponential regression with seasonality

Create a training metric Linear Regression Revenue Prediction Analysis. Left-click Tools, select Training Metric Wizard, click OK. The following window shows up, presenting the available models we can use in MicroStrategy 10:

  1. Choose Linear Regression, then click Next.
  2. Set Dependent Metric to Revenue, and Independent Metrics to Quarter Index and Quarter of the Year.
  3. Chose Predicted Value for Predictor Type. Choose Sum for the Aggregation function.

Following similar steps, we can create the training metric Exponential Regression Revenue Prediction Analysis. The only difference is to choose Exponential Regression for Forecasting numeric values:

Forecasting quarterly revenues using linear regression and exponential regression with seasonality

Now create a new report. Include the Quarter attribute, Revenue, Quarter Index, Quarter of the Year, Linear Regression Revenue Prediction Analysis, and Exponential Regression Revenue Prediction Analysis metrics. Run the report. Now we get the forecasted quarterly revenue for 2015, from both the linear regression model and the exponential regression model:

Forecasting quarterly revenues using linear regression and exponential regression with seasonality

Forecasting quarterly revenues for different regions using tree regression analysis

Tree regression in MicroStrategy 10 enables us to estimate different sets of coefficients for different subsets of data, even when we specified the same model. For example, when forecasting quarterly revenues, tree regression enables us to estimate different sets of model coefficient for different regions. Here is how to do it.

Create a segmentation metric, Region:

Max(Region@DESC) {~}

Change the Level to Report Level, and Region:

Forecasting quarterly revenues for different regions using tree regression analysis

When building the training metric Tree Regression Revenue Prediction Analysis, add the Region metric as a segmentation metric, then we change from Linear Regression to Tree Regression:

Forecasting quarterly revenues for different regions using tree regression analysis

Check Show advanced options, change the Type of Quarter Index from Default to Continuous:

Forecasting quarterly revenues for different regions using tree regression analysis

Click OK and follow the on-screen instructions to finish creating the Tree Regression Revenue Prediction Analysis metric. Use this metric to create a new report, and add the Region attribute as Paged By. Then we get the tree regression segmented by region report.

Tree Regression could offer better predictive results than pooled linear regression, because it offers tailored sets of coefficients for different regions:

Forecasting quarterly revenues for different regions using tree regression analysis

Forecasting monthly revenues using time series analysis

A time series refers to a sequence of values at regular intervals over a given period, for example, annual revenues over a decade. What makes time series special is that the observations have autocorrelation; that is, the value of x(t) is related to its value in the previous period, x(t-1). Autocorrelation needs special econometric treatment, for example, GLS instead of OLS.

There are different approaches to time series analysis:

  • Regression: linear and exponential regression
  • Averages: simple, moving, centered, single double average
  • Exponential smoothing: single, double triple Holt winter's smoothing
  • Autocorrelation model: Box-Jenkins, auto-regression, moving average, ARIMA
  • Spectral: discrete fourier, discrete cosine, fast Fourier, Chebyshev, polynomials

In MicroStrategy 10, exponential smoothing is used to conduct time series analysis. The single exponential smoothing can be written as:

Forecasting monthly revenues using time series analysis

Here, xt is the smoothed series, y t  is the observed series, and A is the smoothing constant between 0 and 1.

Let's learn how to use time series analysis in MicroStrategy via an example. Let's say that monthly revenues from January 2012 to December 2014 are known; we want to forecast future monthly revenues in 2015.

Create Month  Index using the following formula:

(Max([Month of Year]) {~} + (12 * (Max(Year) {~} - 2009)))

Create a training metric Time Series Revenue Predictor. Select the type of analysis as Time Series, select Dependent Metric as Revenue, Independent Metrics as Month Index, and the number of verification records to include with the model as 12:

Forecasting monthly revenues using time series analysis

Create a new report containing a Month attribute, the Revenue metric, the Month Index metric, and the Time Series Revenue Predictor Training metric. Click Data and select Report Data Options. Change the MicroStrategy default inner join to Outer, to allow for viewing predictive values for the Revenue metric in 2014:

Forecasting monthly revenues using time series analysis

Run the report. Adjust the graph preference by clicking Graph, Preference, Maximum number of categories; increase this to 58 to view all the data:

Forecasting monthly revenues using time series analysis

Click OK. Now we have our predicted monthly revenue values, along with a graph:

Forecasting monthly revenues using time series analysis

Telco churn analysis using classification

Classification is a data mining technique to classify data points into different groups. For example, in a hospital emergency room, a triage team classifies patients' different groups according to urgency; a bank classifies its loan applicants into high-risk or low-risk groups. There are two commonly used classification techniques: decision tree and logistic regression.

Decision tree is easy to understand, but less accurate due to its simplified model structure. Logistic regression is more accurate, but its estimation is computationally intensive.

Telco churn analysis using decision tree

A Telco corporation wants to estimate how likely it is that its customers will churn, based on customers' demographic information, usage data, and churn data.

Suppose we have the following data: average minutes during off-peak times, average minutes during peak times, dropped calls, helpdesk calls, renewals, age range, gender, household count, marital status, and income bracket:

Metric Name

Metric Definition

AvgMinOffPeak

Sum([Phone Usage]@[Average Minutes Off-peak]){~}

AvgMinPeak

Sum([Phone Usage]@[Average Minutes Peak]){~}

Dropped Calls

Sum([Phone Usage]@[Dropped Calls]){~}

Helpdesk calls

Sum([Phone Usage]@[Helpdesk Calls]){~}

Renewals

Sum([Phone Usage]@[Renewals]){~}

Marital Status

Max([Marital Status]@DESC){Customer}

Income Bracket

Max([Income Bracket]@DESC){Customer}

Telco Churn

Sum([Phone Usage]@[Churn])

Create a training metric Telco Churn Predictor (Tree) Training Metric, with Telco Churn as a target variable and other metrics as explanatory variables. In Training Metric Wizard, Classifying categorical values, choose Decision Tree. In Predicted Value, Aggregation function, choose Sum:

Telco churn analysis using decision tree

Create a report with Telco Churn Predictor (Tree) Training Metric, and explanatory variables:

Telco churn analysis using decision tree

Create a new report and set a filter with a metric qualification: Telco Churn Predictor - Predicted Value Exactly 1:

Telco churn analysis using decision tree

Run the report; we get a list of customers who are most likely to churn. Save it as Telco Churn Predictor - Prediction Result (Decision Tree):

Telco churn analysis using decision tree

Telco churn analysis using logistic regression

We can follow steps similar to those in the previous section to do churn analysis using logistic regression. We will see at the end of this section that the model prediction accuracy improved, at the cost, however, of increased computation time.

Follow the same steps to create target variables and explanatory variables as in the previous section. The only difference is, when creating the training metric Telco Churn Predictor (Logit) Training Metric, and classifying categorical values, choose Logistic Regression:

Telco churn analysis using logistic regression

Then follow the relevant steps in the previous section to create a training report and to deploy the model. We may find that prediction accuracy is slightly improved by using logistic regression, for example, for a churned customer, Tabitha Barker, the logistic regression predictor made the correct prediction, but the decision tree predictor did not:

Telco churn analysis using logistic regression

For our dataset, the prediction accuracy for logistic regression is 85.7%, compared with the 85.4% prediction accuracy for decision tree:

Telco churn analysis using logistic regression

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset