Using pairs.panel() to look at (visualize) correlations between variables

Within the R ecosystem, there are different packages offering ways to represent correlations between variables in a dataset.

In a way, the powerful plot() function, as seen in the previous recipe, can also be useful for correlation spotting, particularly when plotting all variables against one another (refer to the previous recipe for more details).

Nevertheless, among different alternatives, the one I think may give you a quicker and deeper understanding of the relationship between your data is the pairs.panels() function provided by the psych package by William Revelle.

Getting ready

In order to use the pairs.panels() function, we first need to install and load the psych package:

install.packages("psych")
library(psych)

To test the pairs.panels() functionality, we will use the Iris dataset.

The Iris dataset is one of most used datasets in R tutorials and learning sessions, and it is derived from a 1936 paper by Ronald Fisher, named The use of multiple measurements in taxonomic problems.

Data was observed on 50 samples of three species of the iris flower:

  • Iris setosa
  • Iris virginica
  • Iris versicolor

On each sample for features were recorded:

  • length of the sepals
  • width of the sepals
  • length of the petals
  • width of the petals

In the following example, we will look for correlations between these variables.

How to do it...

  1. Visualize your dataset using pairs.panels():
    pairs.panels(iris, hist.col = "white", ellipses = FALSE)
    How to do it...

How it works…

The pairs.panels() function produces quite a comprehensive plot, showing in one picture the following things:

  • The correlation coefficient between all variables (numbers on the upper-right side of the plot) lets you understand whether a linear correlation is present between your variables
  • The frequency distribution (the histograms on the diagonal) lets you quickly visualize the typical values of your data and the general distribution shapes of your variables
  • The scatterplot among variables in pairs lets you visually find non-linear correlations

    Note

    Some aesthetic parameters are set within the function call; these involve colors for the histograms bars and the plotting of correlation ellipses. You can refer to the There's more… section in this recipe for more details on these arguments.

There's more…

The pairs.panels() function allows you to customize the output; some customizations are purely pertaining to aesthetics and others are related to the computations that happen behind the panel visualization.

Part of the first group is the hist.col argument, which will set the color of the distribution plots produced by the function.

It is also possible to change methods for correlation computation, leveraging the method argument.

The following methods are available:

  • Pearson
  • Spearman
  • Kendall

We can also specify if correlation ellipses, also named confidence or error ellipses, should be added to our plot through, as you may have probably guessed, the ellipses argument.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset