A Visualization Tool for Mining Large Correlation Tables 85
As for particulars of the scatterplot matrix shown in Figure 6.8, the visually most striking
features concern marginal distributions, not associations: The first variable is capped at the
maximal value +90, and the fourth variable is binary. Otherwise the associations look simply
monotone and seem well summarized by correlations.
6.3.6 Variations on Blockplots
Blockplots are not the most common visualizations of correlation tables. As a Google search
of correlation plot reveals, the most frequent visual rendering of correlation tables is in terms
of heatmaps where square cells are always filled and numeric values are coded on a gray or
color scale. An example is shown in the left frame of Figure 6.9; for comparison, the right
frame shows the corresponding blockplot. Here are a few observations about the two types
of plots:
• Color or gray scale is generally a weaker visual cue than size. This argument favors
blockplots as long as the blocks are not too small, that is, as long as the view is not
zoomed out too much. The superiority of blockplots over heatmaps is also noted by
Wickham et al. (2006, Figure 2).
• In heatmaps, color fuses adjacent cells when they are close in value. This may or may
not be a problem for the trained eye, but there is a loss of identity of the rows and
columns in heatmaps.
• Heatmaps do not permit markup with background color because they fill the square or
rectangular cells completely. This problem can be overcome by shrinking the heatmap
cells somewhat to allow some surrounding space to be freed up that can be filled with
background color for markup, as shown in the center frame of Figure 6.9. This method
of rendering, however, seems to further decrease the crispness of heatmaps.
• Heatmaps perform nicely when the view is heavily zoomed out, in which case the
individual blocks are so small that size is no longer visually functional as a cue. In this
case, color coding works well and gives an accurate impression of global structure. We
solve this problem for blockplots by showing only 10,000 or so of the largest correlations
when heavily zoomed out. Thinning the table in this manner works well even when the
visible table is so large that each cell is strictly speaking below the pixel resolution of a
raster screen.
Because none of the two types of plots—blockplots or heatmaps—may be uniformly superior
at all scales, the AN provides both, and with one keystroke, one can toggle between the two
rendering methods. Varying block size allows for the mixed variant shown in the center of
Figure 6.9.
Visualization of correlation tables has a small literature in statistics. An early reference
that addresses large correlation tables is Hills (1969), who applies half-normal plots to
tell statistically significant from insignificant correlations and clusters variables visually
in two-dimensional projections. Closer to the present work are articles by Murdoch and
Chow (1996) and Friendly (2002). Both propose relatively complex renderings of correlations
with ellipses or augmented circles that may not scale up to the sizes of tables we have in
mind but may be useful for conveying richer information for tables that are smaller, yet too
large for numeric table display. Blockplot coding, which uses squares, has the advantage
that these shapes can completely fill their cells to represent extremal correlations as these
are geometrically similar to the shapes of the containing cells (at least if the the default
aspect ratio of the blockplot is maintained), whereas all other shapes leave residual space
even when maximally expanded.
What we prefer to call descriptively blockplots, possibly contracted to blots,has
previously been named fluctuation diagrams (Hofmann 2000). Under this term, one can find
a static implementation in the R-package extracat on the CRAN site authored by Pilhoefer