Forecasting is using data trends and cycles to predict future values. For example, claims, sales, and expenses data have seasonality and trends. When we count their cycles and trends, we can predict their future values reasonably well. There are two commonly used forecasting techniques: regression analysis and time series analysis. In terms of regression analysis, MicroStrategy supports linear regression, exponential regression, tree regression, and logistic regression. Logistic regression generally applies to predicting categorical target variables, while other regressions models generally apply to predicting numerical target variables.
In this section, we will demonstrate MicroStrategy's predictive analysis capabilities with several business applications. You will see that MicroStrategy has simplified the work to the extent that using different types of data mining model is simply a single mouse click away.
Suppose we have a company's quarterly revenue data from 2012 Q1 to 2014 Q4. Based on these data, we want to forecast its quarterly revenues in 2015. Values for Revenue
variable are given. Quarter Index
, and Quarter of the Year
metrics can be created:
The Quarter Index
metric is defined as:
((Max(Quarter@ID) {~} - (10 * Max(Year@ID)) {~} ) + ((Max(Year@ID) {~} - 2012) * 4))
The Quarter of the Year
metric is defined as:
(Max(Quarter@ID) {~} - (10 * Max(Year@ID) {~} ))
Create a training metric Linear Regression Revenue Prediction Analysis. Left-click Tools, select Training Metric Wizard, click OK. The following window shows up, presenting the available models we can use in MicroStrategy 10:
Revenue
, and Independent Metrics to Quarter Index
and Quarter of the Year
.Sum
for the Aggregation function.Following similar steps, we can create the training metric Exponential Regression Revenue Prediction Analysis. The only difference is to choose Exponential Regression for Forecasting numeric values:
Now create a new report. Include the Quarter
attribute, Revenue
, Quarter Index
, Quarter of the Year
, Linear Regression Revenue Prediction Analysis
, and Exponential Regression Revenue Prediction Analysis
metrics. Run the report. Now we get the forecasted quarterly revenue for 2015, from both the linear regression model and the exponential regression model:
Tree regression in MicroStrategy 10 enables us to estimate different sets of coefficients for different subsets of data, even when we specified the same model. For example, when forecasting quarterly revenues, tree regression enables us to estimate different sets of model coefficient for different regions. Here is how to do it.
Create a segmentation metric, Region:
Max(Region@DESC) {~}
Change the Level to Report Level, and Region:
When building the training metric Tree Regression Revenue Prediction Analysis
, add the Region
metric as a segmentation metric, then we change from Linear Regression
to Tree Regression
:
Check Show advanced options, change the Type of Quarter Index
from Default to Continuous:
Click OK and follow the on-screen instructions to finish creating the Tree Regression Revenue Prediction Analysis
metric. Use this metric to create a new report, and add the Region
attribute as Paged By
. Then we get the tree regression segmented by region report.
Tree Regression
could offer better predictive results than pooled linear regression, because it offers tailored sets of coefficients for different regions:
A time series refers to a sequence of values at regular intervals over a given period, for example, annual revenues over a decade. What makes time series special is that the observations have autocorrelation; that is, the value of x(t) is related to its value in the previous period, x(t-1). Autocorrelation needs special econometric treatment, for example, GLS instead of OLS.
There are different approaches to time series analysis:
In MicroStrategy 10, exponential smoothing is used to conduct time series analysis. The single exponential smoothing can be written as:
Here, xt is the smoothed series, y t is the observed series, and A is the smoothing constant between 0 and 1.
Let's learn how to use time series analysis in MicroStrategy via an example. Let's say that monthly revenues from January 2012 to December 2014 are known; we want to forecast future monthly revenues in 2015.
Create Month
Index
using the following formula:
(Max([Month of Year]) {~} + (12 * (Max(Year) {~} - 2009)))
Create a training metric Time Series Revenue Predictor. Select the type of analysis as Time Series, select Dependent Metric as Revenue
, Independent Metrics as Month Index
, and the number of verification records to include with the model as 12:
Create a new report containing a Month
attribute, the Revenue
metric, the Month Index
metric, and the Time Series Revenue Predictor Training
metric. Click Data and select Report Data Options. Change the MicroStrategy default inner join to Outer
, to allow for viewing predictive values for the Revenue
metric in 2014:
Run the report. Adjust the graph preference by clicking Graph, Preference, Maximum number of categories; increase this to 58
to view all the data:
Click OK. Now we have our predicted monthly revenue values, along with a graph:
Classification is a data mining technique to classify data points into different groups. For example, in a hospital emergency room, a triage team classifies patients' different groups according to urgency; a bank classifies its loan applicants into high-risk or low-risk groups. There are two commonly used classification techniques: decision tree and logistic regression.
Decision tree is easy to understand, but less accurate due to its simplified model structure. Logistic regression is more accurate, but its estimation is computationally intensive.
A Telco corporation wants to estimate how likely it is that its customers will churn, based on customers' demographic information, usage data, and churn data.
Suppose we have the following data: average minutes during off-peak times, average minutes during peak times, dropped calls, helpdesk calls, renewals, age range, gender, household count, marital status, and income bracket:
Metric Name |
Metric Definition |
|
Sum([Phone Usage]@[Average Minutes Off-peak]){~} |
|
Sum([Phone Usage]@[Average Minutes Peak]){~} |
|
Sum([Phone Usage]@[Dropped Calls]){~} |
|
Sum([Phone Usage]@[Helpdesk Calls]){~} |
|
Sum([Phone Usage]@[Renewals]){~} |
|
Max([Marital Status]@DESC){Customer} |
|
Max([Income Bracket]@DESC){Customer} |
|
Sum([Phone Usage]@[Churn]) |
Create a training metric Telco Churn Predictor (Tree) Training Metric
, with Telco Churn
as a target variable and other metrics as explanatory variables. In Training Metric Wizard, Classifying categorical values, choose Decision Tree. In Predicted Value, Aggregation function, choose Sum
:
Create a report with Telco Churn Predictor (Tree) Training Metric
, and explanatory variables:
Create a new report and set a filter with a metric qualification: Telco Churn Predictor - Predicted Value Exactly 1
:
Run the report; we get a list of customers who are most likely to churn. Save it as Telco Churn Predictor - Prediction Result (Decision Tree)
:
We can follow steps similar to those in the previous section to do churn analysis using logistic regression. We will see at the end of this section that the model prediction accuracy improved, at the cost, however, of increased computation time.
Follow the same steps to create target variables and explanatory variables as in the previous section. The only difference is, when creating the training metric Telco Churn Predictor (Logit) Training Metric
, and classifying categorical values, choose Logistic Regression:
Then follow the relevant steps in the previous section to create a training report and to deploy the model. We may find that prediction accuracy is slightly improved by using logistic regression, for example, for a churned customer, Tabitha Barker
, the logistic regression predictor made the correct prediction, but the decision tree predictor did not:
For our dataset, the prediction accuracy for logistic regression is 85.7%, compared with the 85.4% prediction accuracy for decision tree: