Taking Data Mining to the Cloud with Predixion Software

In the previous section, we learned that the SQL Server Data Mining Add-In for Excel, in the simplest terms, is an interface for creating data mining models in SSAS, transporting the data to the SSAS model, and finally viewing the results of the processed model. At least one company, Predixion Software, is taking this architecture to the next logical step. If you can mine data using a server that is on your network, why not use a server or group of servers, anywhere available via the Internet, in a secure manner? Their products, Insight Analytics and Insight Now, are Microsoft Excel add-ins to deliver data mining as a service.

The benefits of the Predixion products include the ability to consume PowerPivot data directly. There is no need to define a PivotTable only to convert it for consumption by the Data Mining Add-In. This approach means larger data volumes can be used, because PivotTable to OLAP formula conversion time is avoided completely. Finally, Predixion has already created a 64-bit version of their tools. There is no need to install 32-bit Excel in order to use the Data Mining Add-In.

The best way to contrast data mining with Predixion and the Data Mining Add-In for Excel is to tackle the identical example, using the Predixion data mining as a service.

Setting Up the Predixion Add-In

In order to be a fair comparison, the steps to set up the Predixion software will be covered in this section. To begin, you will first need to create an account with Predixion Software at www.predixionsoftware.com. This example utilized the free service evaluation account.

After registering, follow the link to download Predixion Insight, available in the Predixion Products area, as illustrated in Figure 10-34.

images

Figure 10-34. Download page for Predixion Insight

Notice another benefit of Predixion Insight from the download page. A download is available for Excel 2007. Just like PowerPivot for Excel, Predixion Insight has separate installers for the 32-bit and 64-bit versions of Microsoft Excel. As we are using the 64-bit version of Excel, the illustrations here are for the corresponding 64-bit version of Predixion Insight.

Executing the installer from within Internet Explorer, you may observe the security warning illustrated in Figure 10-35. Click the Run button to continue installation.

images

Figure 10-35. Internet Explorer Security Warning dialog

The Predixion Insight installer will proceed to elegantly check for and, if necessary, install required software components. Your specific installation path may vary, based on existing software installations. The next step in the installation is the verification and installation of the Visual Studio Tools for Office Runtime. Acknowledgment and agreement to the license terms will be required, as illustrated in Figure 10-36.

images

Figure 10-36. Visual Studio Tools for Office Runtime license

Clicking the Accept button will advance the installer to the next step, the Setup Wizard, as illustrated in Figure 10-37.

images

Figure 10-37. Predixion Insight Setup Wizard dialog

Clicking the Next button will continue the installation process. The next step is reading and accepting the Predixion Software license agreement, as illustrated in Figure 10-38.

images

Figure 10-38. Predixion Insight subscription and license

Verify that the I Agree radio button is selected, and click the Next button. This will continue the process at the installation location dialog, illustrated in Figure 10-39.

images

Figure 10-39. Installation location

Select a suitable location on your local machine for the installer to copy the Predixion Insight software. When you have selected the folder, click the Next button to continue to the next step, the installation confirmation illustrated in Figure 10-40.

images

Figure 10-40. Confirm Installation dialog

This dialog is the final opportunity to change any of the installation values. Click the Next button if you need to verify or alter your installation settings. Otherwise, clicking the Next button will advance the process to the dependency check, illustrated in Figure 10-41.

images

Figure 10-41. Prerequisite check

The dependencies check will verify the required software already exists on the target workstation. A failure at this step will likely require the installation of a software component to continue. Clicking the OK button will install the software, and the Installation Complete dialog, illustrated in Figure 10-42, will appear.

images

Figure 10-42. Installation Complete dialog

Click the Close button to complete the installation wizard.

When starting Microsoft Excel for the first time, after the Predixion Insight installation, you may receive a warning dialog similar to Figure 10-43. In order to use Predixion Insight, click the Install button.

images

Figure 10-43. Customization warning

After advancing through the customization warning dialog, Predixion Insight will produce a banner similar to Figure 10-44, within the Excel window. The banner contains links to helpful tutorials, sample datasets, and support resources at Predixion Software. Close the banner in order to begin our example using Predixion Insight.

images

Figure 10-44. Predixion Insight banner

Predicting Airline Delays

To continue the airline delay example using Predixion Insight, it is necessary to understand a little about Insight Analytics and Insight Now. Much like the Data Mining Add-In for Excel, Insight Now is a task-oriented interface for creating and using data mining for prediction, forecasting, and classification. Additionally, Predixion Software includes Insight Analytics, which is less task-oriented and more closely related to an integrated development environment (IDE), hosted within Microsoft Excel. As Insight Now resembles the features and functions within the Data Mining Add-In for Excel, we will focus on re-creating our example with Insight Now. Both Insight Analytics and Insight Now are available from the Office Excel ribbon, at all times. Unlike the context-sensitive Data Mining Add-In for Excel, the Predixion Software add-ins are not context-sensitive. Figure 10-45 illustrates the menu items within Insight Now. Compare the selections available within the Insights group of the Insight Now ribbon with the Table Analysis Tools illustrated in Figure 10-31. The options are identical, but the interface is very different.

images

Figure 10-45. Predixion Insight Now ribbon

In order to see how different, we will have to create a model similar to the airline delay developed in the first half of this chapter. To begin, open the On Time Performance.xlsx worksheet from the example files. This should be the same file used in the first half of the chapter. Select the Insight Now ribbon, as illustrated in Figure 10-45, selecting the Analyze Key Influencers from the Insights group. The Analyze Key Influencers selection will generate a dialog similar to Figure 10-46.

images

Figure 10-46. Analyze Key Influencers Input Source dialog

Notice the option to use PowerPivot data in the Select Input Type drop-down? This is one of the key differences between Predixion Insight and the Data Mining Add-Ins. Predixion Insight has the ability to natively utilize PowerPivot data. The manipulation required to go from PowerPivot to Excel tables is eliminated! Direct use of PowerPivot as a data source also permits large volumes of data to be fed into Predixion's SSAS servers in the Internet “cloud.”

As illustrated in Figure 10-46, select PowerPivot Data as the Input Type. The PowerPivot table should be set to On_Time_On_Time_Performance_2010. Finally, filter the dataset to February of 2010. When finished, your settings should be identical to Figure 10-46. Click the OK button to continue to the target column selection, illustrated in Figure 10-47.

images

Figure 10-47. Target column selection

The meaning of the target column has not changed. However, because Predixion Insight can consume PowerPivot data directly, we can use the individual values of WeatherDelay, instead of aggregating the measure by carrier, origin, destination, etc. After setting the target column to WeatherDelay, use the “Choose columns to be used for analysis” link to filter columns from the PowerPivot table. Following the link will produce a dialog similar to the one illustrated in Figure 10-48.

images

Figure 10-48. Source column selection

Verify that only the Carrier, Origin, Dest, Month, DayofWeek, and DistanceGroup columns are selected. Then click the OK button to return to the Target Column dialog. From the Target Column dialog, click the Run button to begin the analysis and produce the progress dialog similar to Figure 10-49.

images

Figure 10-49. Analyze Key Influencers progress

After the Run button is clicked, the Insight Now software uploads the portion of the PowerPivot table selected to their servers. The data mining model, created by the dialogs of the Analyze Key Influencers selection, is uploaded to the Predixion Software servers as well. The Predixion server then executes the model, using the data, generating a report similar to Figure 10-50.

images

Figure 10-50. Key Influencers report

What a difference additional detailed data makes. The classification of WeatherDelay values were different, with the worst level being approximately 35 minutes. Evidently February 2010 was a bad month to be flying Comair (carrier code OH) or through New York's Kennedy Airport.

An additional observation is the speed at which the process was able to move, from software setup to final report. The Predixion Insight process has the additional advantage of being more flexible, should the data analyst want to include other data from the PowerPivot database in the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset