Computational analytics

The Spotfire platform has a set of computational analytics functionalities that are distributed among several statistical tools.

One of these functionalities is Predictive modeling, which is the mostly used form of data mining, and allows the prediction of the future by forecasting the probabilities and trends. This statistical model is widely used in IT, for instance in e-mail spam filters, to identify the probability of a message being spam.

In Spotfire, predictive modeling is composed by three distinct steps:

  • Fitting the model: In this step, the model options are chosen, such as statistical method to use, source data table, column to predict, predictor columns, and so on. The output of this step is a model page.
  • Evaluating the model :The model page is evaluated against new data, into an evaluation page.
  • Predicting the model: Additional prediction data becomes available, which can be added to the source data table.

Two different models are supported for predictive modeling:

  • Regression modeling: It offers both the linear regression method and the regression tree method.
  • Classification modeling: It offers both the logistic regression method and the classification tree method.

Each model has a different tool, which can be started by accessing the menu tools.

This book will focus on regression modeling or the linear regression method. For extra information regarding predictive modeling and the other models, refer to Spotfire Professional's Help Topics documentation on the page Predictive Modeling.

Regression modeling/the linear regression model

In this model, the scores on one variable are predicted from the scores on a second variable. The variable we are predicting is known as Response and the variable we are basing our predictions is known as Predictor . When there is only one predictor variable, the prediction method is called Simple regression. And, when there are many predictor variables, the prediction method is called Multiple regression.

Before attempting to use a linear regression model on the existing data, the user should first determine whether or not there is a relationship between the response and the predictor variables.

In the visualization of the page StoreSales - Total Sales, we created a scatter plot that correlates the StoreSales variable Number of purchases with the variable Total. Both the visualization's correlation value of 68 percent and the Straight Line Fit direction indicate that there is in fact a positive correlation between the variables.

The StoreSales data is therefore suitable for the creation of an example of the linear regression model, but since there is only one set of data available, we will need to break it into two; we will separate the data into two files according to the year in the Recency column – 1994 or 1995. This can be achieved using MS Excel or any other tool that will help in the easy location and deletion of lines with certain content. In the new data files, columns Customer age, Date joined, First buy, and Recency can also be removed as they are not necessary for our example.

Import both the files and create two data tables named StoreSales_1994 (for the 1994 file) and StoreSales_1995 (for the 1995 file). If you kept the columns Customer age, Date joined, First buy, and Recency, beware of the date-time formats while importing the files. When done, you can close the default created scatter plots and delete their respective pages.

Our objective with this predictive model will be to predict the sales Total, based on the Number of purchases. This will be a simple linear regression.

According to the three steps of predictive modeling, we must first fit the model. To do so, we will access the Tools menu and select Regression Modeling. This action will open the regression modeling configuration tool.

Our model will be named StoreSalesRegressionModel, and the remaining configuration should be the following:

  • Comment: Predict Sales Total based on Number of purchases
  • Model Method: Linear Regression
  • Data table: StoreSales_1994
  • The General tab | Response column: Total
  • The General tab | Predictor column: Number of purchases

Click on the Add button.

The filled configuration should look similar to the following screenshot:

Regression modeling/the linear regression model

Click on OK. After a few seconds, a page named StoreSalesRegressionModel will be generated, containing the information about our model. The following screenshot shows an example of such a page:

Regression modeling/the linear regression model

The generated model page contains 4 areas:

  • Model Summary: It lists the name of the model, the model type, and the model formula
  • Table of Coefficients: It lists the model coefficients for the regression model
  • Available Diagnostic Visualizations: It lists the available visualizations of the predictive model, which help to assert its validity
  • Visualizations area: It is the area where the diagnostic visualizations are presented

Proceeding to the second step of our predictive model, we will now evaluate the model. To do so, click on the Evaluate model button from the Model Summary's toolbar. Refer to the following screenshot for details:

Regression modeling/the linear regression model

A dialog box will be presented, where we can specify the data table on which we want to run the model. We will run it on StoreSales_1995.

Make sure that you have the configuration as shown in the following screenshot:

Regression modeling/the linear regression model

Click on OK. After a few seconds, a page named Evaluation (StoreSalesRegressionModel) will be generated, containing the information about our evaluation. The following screenshot shows an example of such a page:

Regression modeling/the linear regression model

The generated model page contains three areas:

  • Evaluation Summary: It lists name of the model, the data table used in the evaluation, and the model formula
  • Available Diagnostic Visualizations: It lists the available visualizations of the predictive model, which help to assert its validity
  • Visualizations area: It is the area where the diagnostic visualizations are presented

As a final step of the predictive model, we will add the predicted data to the StoreSales_1994 data table . For that, click on the Predict from model button from the Model Summary's toolbar. Refer to the following screenshot for details:

Regression modeling/the linear regression model

A dialog box will be presented, where we can specify the data table where to add the predicted data. We will choose StoreSales_1994. Also, make sure that you have the configuration presented on the following screenshot:

Regression modeling/the linear regression model

A new column will be added to the StoreSales_1994 data table named predicted.

Please save the analysis project.

Information designer

The Information designer is a Spotfire tool to create information links. These links are database queries created from simple elements (columns, filters, procedures, and joins) which are composed into complex queries. The purpose of information links is their later usage as a data table data source. This tool can be found under the Tools menu.

After starting Information Designer, the first configuration step is to define Data Source. This can be done by selecting the Setup Data Source option (or by using the New dropdown on the top left corner). A configuration as shown in the following screenshot specifies the creation of a connection to our installed database, with user hr.

Information designer

After configuring it, please save it by clicking on the Save As button.

After creating a data source, and going back to the Start tab, users can then define the Elements (columns, filters, and so on) required for the creation of the information link.

Our objective will be to create two Multiple Columns elements: one with the columns DEPARTMENT_ID and DEPARTMENT_NAME from the HR table DEPARTMENTS; and a second one with the columns DEPARTMENT_ID, FIRST_NAME and LAST_NAME from HR table EMPLOYEES. The following screen demonstrates the configuration for the first element:

Information designer

For starting the creation procedure, it is necessary to select the option Multiple Columns under the Start tab of the Create Elements option. Also click on the Create Columns button.

Since the columns will be stored as individual elements, it is a good idea to create the following folder structure to store them; refer to the following screenshot:

Information designer

This can be achieved using the New drop-down menu.

The result of the creation of both Multiple Columns should be similar to the following screenshot:

Information designer

Next, we will be creating a join between both DEPARTMENT_ID columns (in both the DEPARTMENTS and EMPLOYEES tables). The Create Join option of the Start tab triggers the configuration of joins between the Data Sources tables.

To create such a join in the Join configuration, both the DEPARTMENT_ID columns must be added. The resulting Join should be saved under the Information Links folder.

Our last step will be the creation of the Information Link itself. This procedure can be triggered by clicking on the Create Information Link button in the tool's Start tab (or by using the New dropdown). The link's elements will be EMPLOYEES | FIRST_NAME, EMPLOYEES | LAST_NAME, EMPLOYEES | DEPARTMENT_ID and DEPARTMENTS | DEPARTMENT_NAME, and Join path will be the created join DEPARTMENTS - EMPLOYEES.

At this point, the Information Link's configuration should look similar to the following screenshot:

Information designer

Please save it as demoIL under the Information Links folder, and close the Information Designer window.

At this point, a data table can be created, having demoIL Information Link as the source. Please do it and feel free to create a visualization with this main data table.

You should also be aware that all the configured Elements, Joins, and Information Links will be available for sharing in the library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset