Chapter 14

Summary

The single biggest problem in communication is the illusion that it has taken place.

George Bernard Shaw

14.1 Data analysis and graphics

Good graphics are informative, effective and flexible. Graphics can be attractive and encourage discussion, they can be more insightful and convincing than text. They are fairly easy to explain to others, often easier to explain than the results of statistical analyses. There is every reason to use them more in data analysis.

With the benefit of hindsight, Graphical Data Analysis is simple, you just show the information contained in the data. In practice that is less straightforward than it sounds, you first have to find out what information is there. The approach described in this book recommends using a collection of common graphic displays to help you uncover information. This is not about looking for a single optimal display and showing how to draw the plot as attractively and precisely as possible. Rather it is about drawing many displays and aiming to find an optimal set. Here ‘optimal' means that taken as a group the set of graphics show a lot—if not all—of the information that is in the data.

Different types of graphics highlight different features in datasets. It is worth drawing a few to check what you can see. A group of graphics of the same type, but with different formatting and scaling, may also pick out a range of different aspects of data. It is useful to experiment with different options: growing or shrinking plots, changing their aspect ratios, reordering categories, and trying out different binwidths or bandwidths. Of course, it is all too easy to get lost in large numbers of variables and piles of output. Successful analyses, particularly of big datasets, depend on good organisation. Ideally you need a manager for your analyses like the housekeeper played by Helen Mirren in the film “Gosford Park”, who knew in advance what the guests needed, before they knew it themselves.

14.2 Key features of GDA

GDA entails drawing and interpreting graphics, the drawing alone is not enough. “Every Picture Tells a Story” (Rod Stewart) and it is an essential part of GDA to look for the story in every picture. A graphic with no message is no use.

GDA requires a strategic approach. It is about using many graphics at once and being prepared to consider several different lines of thought in parallel. Who can tell which of the features identified in graphics may turn out to be the most interesting? It is important to have overall goals and simultaneously to be flexible in pursuing the general goal of discovering new information.

Graphics may look different because of how they are drawn—the formatting, scaling, and colouring. Using many graphics avoids possibly misleading effects in individual displays due to this. On the other hand the same graphic may be interpreted differently because of the size of the underlying dataset or the subject matter context. Any such difference is intrinsic and should be emphasised not avoided.

GDA is about the generation of ideas, not just data description, and not the testing of ideas, the contrast Tukey was referring to in writing of detectives and judges. GDA should be used in close association with statistical modelling, as the two approaches complement each other well. Testing tells you if a null hypothesis should be rejected but not necessarily why. Graphics show you what might be ‘wrong', but gives no guidance on how strong the evidence is. Statistical tests are useful for checking whether an interesting graphical feature is significant. Graphics are useful for asking whether a statistically significant result is really of interest

You can't prove anything with statistics, but you can disprove some things and the same goes for graphics. Drawing many graphics quickly and informatively for exploratory purposes is also different from drawing a few graphics attractively and precisely for presentation purposes. Tukey might have suggested the contrast of detectives and designers.

14.3 Strengths and weaknesses of GDA

Every approach has its pluses and minuses. In the case of GDA there can be downsides due to badly chosen graphics, overloaded graphics, poorly organised groups of graphics, over-interpretation of graphics, and apophenia (seeing patterns where there aren't any). GDA is, of course, also affected if data are inadequate, a problem for all statistical and data analyses.

The strengths of GDA lie in its flexibility and ease of communication. It is also rather robust. If you make an error or draw the wrong plot, you can readily see what has happened and fix the problem. GDA is good for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modelling output, and presenting results. It is about generating ideas, while statistics is more about evaluating ideas.

14.4 Recommendations for GDA

A number of recommendations recur in this book:

  • Use graphics to discover information that is difficult to investigate statistically.
  • Draw many graphics and vary the graphics options.
  • Gain experience in interpreting graphics and use the graphics types you know well.
  • Consider reformatting datasets before drawing graphics.
  • Make appropriate comparisons and choose comparable scales.
  • Check any graphical result you find with statistical models, where possible. Statistics and graphics complement one another.
  • Always remember how important context is in interpreting results.

The same statistics and graphics might be interpreted quite differently for different applications. That is why this book has concentrated on using ‘real' datasets. Finally, interactive graphics have occasionally been mentioned in this book as a tool with potential for GDA and that is an important topic for the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset