Windowing data

Windowing is typically used to turn time series data into example sets containing examples with multiple attributes corresponding to sequential points. These example sets can then be used for model building, classification, or predictive analysis. Windows can also be used to visualize data.

The best way to explain this operator is to use real data to illustrate its main features. Some real sunspot data is shown in the following screenshot (the process to recreate this is available in readSunspot.xml, which accompanies this book):

Windowing data

This data is a yearly count of the number of dark spots visible on the sun and follows a number of cycles of which the 11-year one is the most well known. Applying the Windowing operator with a window size of 11 and a step size of 11 to an example set containing a single attribute called sunspots yields the data shown in the following screenshot (the year attribute has been made into an ID using the Generate Id operator):

Windowing data

The parameters set for the Windowing operator to achieve this is shown in the following screenshot:

Windowing data

The parameters window size and step size are both set to 11 in this case. The window size value has the effect of creating 11 new attributes with names ranging from sunspots-10 to sunspots-0. The step size value dictates how many values to step forward to start each new window.

The first row corresponds to the first 11 entries in the input example set, which in this case are the years 1700 to 1710 (both inclusive). Close examination of the data reveals that the year 1700 had five sunspots and this corresponds to the value for sunspots-10 in row one. The second row corresponds to the years 1711 to 1721 (both inclusive). The year 1711 had zero sunspots, and this corresponds to the value for sunspots-10 in row 2.

A label attribute can be added to the windows by checking the create label check box. This allows the value of another attribute to be used as a label, and the horizon parameter that appears in this case allows the value of the label to be chosen from an arbitrary point in the future.

The end result of this activity is an example set with examples corresponding (in this case) to the 11-year solar cycle. The process that does this is manipulateSunspots.xml and is available with the files that accompany this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset