Time series data

Real data is often in the form of a time series and this makes visualization difficult when there are many attributes over an extended time period. There are a number of plotters as well as some summarizing techniques that can help. The following section gives some examples. For one of the sections, it is necessary to download and install the Series Processing extension from the RapidMiner marketplace. This is done from the RapidMiner Studio GUI by navigating to Help | Updates and Extensions. From here, type Series Extension in the search box and once the results are returned, select the entry and follow the onscreen instructions.

Plotting series

The series plotter and series multiple plotter simply plot the series data. Again, using the process readDataToVisualize.xml, the following graph shows att15 plotted as a function of time. The graph shows that there is some structure as a function of time but it can be difficult to interpret because there are so many data points:

Plotting series

One approach to simplify this is to use the Moving Average operator to smooth the data out. This operator simply calculates a moving average for an attribute, given a window size, and creates a new example in the example set. An example of using the Moving Average operator with a window size of 200 is shown in the following screenshot:

Plotting series

A process called MovingAveragePlotter.xml is provided with this book to generate the result shown in the previous screenshot. This process generates moving averages for all the attributes in this data and does various tidying activities to make the names of the generated attributes easy to understand.

It is possible to display both series on a single graph using the series multiple plotter, and recalling that attributes 13 and 15 showed evidence of correlation from previous results, the two moving average plots for these attributes is shown in the following graph:

Plotting series

This graph gives the evidence that there is a time dependent variation that correlates between the attributes, although it is not exact. This illustrates the importance of a good visualization when trying to understand the data.

Using the survey plotter

When the number of series to plot becomes very large, it can become unwieldy to use the series plotter. In this case, the survey plotter can be useful.

The best way to understand how this plotter works is to look at an example, like the one shown in the following screenshot:

Using the survey plotter

This plot can be recreated using the MovingAveragePlotter.xml process. Note that for reasons of space and readability, the previous plot shows a small portion of the survey plot in this case. The plot has the first column for the plotter set to date and the color column is set to the color attribute that is generated by the process.

Each vertical plot shows how one attribute varies as a function of another. If the data contains date as an attribute, then sorting by date produces a time series view of each of the other attributes. This is what is shown in the previous screenshot. Each vertical represents a time series for a different attribute, with time increasing downwards. The attributes in this example start with the date at the left, followed by att1 to att15 (both inclusive). Think of this display as a 90 degree clockwise rotation of a time series.

This view brings out the relations between attributes. It is extremely clear which attributes correlate with one another and within the context of exploratory data analysis, this raises questions that, once answered, will help the data to be understood better. In the previous screenshot the following attributes appear to be correlated: att1, att2, att3, att4, att6, att11, and att12. Furthermore, the same can be said for att9 and att10.

By setting the color of the survey plot to be an attribute, the series are colored based on the value of this attribute. This allows correlations between it and other attributes to be seen.

The end result of using this plotter is a better understanding of time series as well as more detail about how a multivariate time series behaves and the possibility of getting an insight into how attributes relate to one another.

The relation between attributes is one aspect of understanding through visualization. Another aspect is how examples relate to one another and this is covered in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset