Mining Some Data

Often data mining techniques are used to better understand the characteristics of events. As an illustrative exercise on the value of data mining, we are going to use data from the United States Department of Transportation's Bureau of Transportation Statistics. In the sample file, we have approximately 30,000 records summarizing flights by carrier, origin, destination, day of week, and distance group. Each summary row contains a value representing the total delays for weather, for the group of characteristics.

After converting the PivotTable to an Excel table as detailed in the previous section, select a cell within the table. Selecting a table cell is necessary in order for the context-sensitive Table Tools ribbon to become visible. As illustrated in Figure 10-31, select the Analyze Key Influencers task.

images

Figure 10-31. Analyze Key Influencers task

The Data Mining Add-In will begin the analysis process by prompting for a target column. See Figure 10-32.

images

Figure 10-32. Choosing a target column

When analyzing key influencers, the target column represents the data values for which key predictors (or influencers) are being sought. Our target is the Sum of WeatherDelay column. We could optionally use the “Choose columns to be used for analysis” option to exclude columns from the remaining, non-target data. However, for this example, we have reduced the existing on-time arrival data to a few discriminating columns of data so that we can use the Carrier through DistanceGroup columns from the Excel table. Clicking the Run button at this point performs the actual data mining model creation, training, and processing. The end result of the process is a Key Influencers report similar to Figure 10-33.

images

Figure 10-33. Key Influencers report

Because of the small set of data columns and the use of only February data, the Carrier column has been determined as the primary influencer for all groups of Sum of WeatherDelay. The Key Influencers report has grouped statistically similar values of Sum of WeatherDelay into five classifications. Each value set has a color code associated with the bar graph in the Relative Impact column. The Key Influencers report is telling the reader, for example, that carrier code B6 (JetBlue) had a large relative impact on values of Sum of WeatherDelay greater than 654 minutes. In fact, there was a blizzard during February 10–11, 2010 affecting Boston and New York, causing massive weather delays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset