Weaknesses of summary EDA - the Anscombe quartet

In the meantime, I have found my notes about the Anscombe quartet. Look at these four plots:

By Anscombe.svg: Schutz Derivative works of this file:(label using subscripts): Avenue (Anscombe.svg) [GPL (http://www.gnu.org/licenses/gpl.html), CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

Those are four different plots, aren't they? Nevertheless, as shown by Francis Anscombe in 1973, all of them share the same value for all of the following parameters:

  • Mean of x and y
  • Variance
  • Pearson correlation between x and y

This was quite a shock for some of the first readers of the paper, nevertheless, it served as a really effective way to show how misleading summary statistics can be. Moreover, it was an equally powerful way to show how relevant it was to look at a graphical representation of available data. This was not so common at the time, and it was actually considered as a kind of activity performed by people not enough skilled to compute and understand other kinds of analysis.

OK, following the path opened by Francis with his quartet, let's start performing some graphical exploratory data analysis and see if the reason for the drop finally reveals itself. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset