Chapter 3. Visualizing Data

Large amounts of data are difficult to understand; this chapter is about techniques that will help make sense of this data with visualizations. Visualization is vital at every stage of the exploration and mining process, and you will use these techniques time and again. Sometimes, the structure of the data needs to be changed to plot it effectively. This requires the use of RapidMiner operators to generate new attributes or pivot the data in new ways. Some of these techniques are a preview of what we will cover in the next chapter.

Getting started

Whenever a RapidMiner process runs, the results are presented on the results perspective. This perspective is shown by selecting the appropriate option from the GUI menu or by pressing F9. Each result that can be viewed, such as example sets, models, log entries, or weights are displayed as tabs.

Selecting an example set presents a number of possible detailed views, including the Data View, Statistics View, and Charts View. The Data View gives a simple table summary of the example set. The Statistics View gives a statistical summary of the example set and details of the attributes and size of the data. The Charts View provides a large number of possible plotters, and these are selected from the drop-down list at the top left of the view. The RapidMiner Studio Charts view displays a useful thumbnail graphic for each of the possible chart types to make it easy to select the desired chart.

The simplest plot is the scatter plot and this appears as the first option in the drop-down list. This plot allows attributes to be shown on a two-dimensional grid with the x and y axes being determined by attributes from the example set. Individual points can be colored based on an attribute. Despite its simplicity, this plot is very useful to get an initial feel of the data. This is because it gives answers for questions such as: Are there obvious relations between attributes? Are there any points that look like outliers? and so on.

There are many other plotters that are available, and the RapidMiner Studio GUI selects the one that is appropriate to display the example set depending on its characteristics. It is always worth trying other chart types to see if these help to give a better understanding of the data. However, the following sections give some more detail of some specific techniques that can help.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset