Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using pairs.panel() to look at (visualize) correlations between variables

Within the R ecosystem, there are different packages offering ways to represent correlations between variables in a dataset.

In a way, the powerful plot() function, as seen in the previous recipe, can also be useful for correlation spotting, particularly when plotting all variables against one another (refer to the previous recipe for more details).

Nevertheless, among different alternatives, the one I think may give you a quicker and deeper understanding of the relationship between your data is the pairs.panels() function provided by the psych package by William Revelle.

Getting ready

In order to use the pairs.panels() function, we first need to install and load the psych package:

install.packages("psych")
library(psych)

To test the pairs.panels() functionality, we will use the Iris dataset.

The Iris dataset is one of most used datasets in R tutorials and learning sessions, and it is derived from a 1936 paper by Ronald Fisher, named The use of multiple measurements in taxonomic problems.

Data was observed on 50 samples of three species of the iris flower:

Iris setosa
Iris virginica
Iris versicolor

On each sample for features were recorded:

length of the sepals
width of the sepals
length of the petals
width of the petals

In the following example, we will look for correlations between these variables.

How to do it...

Visualize your dataset using pairs.panels():

pairs.panels(iris, hist.col = "white", ellipses = FALSE)

How it works…

The pairs.panels() function produces quite a comprehensive plot, showing in one picture the following things:

The correlation coefficient between all variables (numbers on the upper-right side of the plot) lets you understand whether a linear correlation is present between your variables
The frequency distribution (the histograms on the diagonal) lets you quickly visualize the typical values of your data and the general distribution shapes of your variables
The scatterplot among variables in pairs lets you visually find non-linear correlations
Note
Some aesthetic parameters are set within the function call; these involve colors for the histograms bars and the plotting of correlation ellipses. You can refer to the There's more… section in this recipe for more details on these arguments.

There's more…

The pairs.panels() function allows you to customize the output; some customizations are purely pertaining to aesthetics and others are related to the computations that happen behind the panel visualization.

Part of the first group is the hist.col argument, which will set the color of the distribution plots produced by the function.

It is also possible to change methods for correlation computation, leveraging the method argument.

The following methods are available:

Pearson
Spearman
Kendall

We can also specify if correlation ellipses, also named confidence or error ellipses, should be added to our plot through, as you may have probably guessed, the ellipses argument.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Using pairs.panel() to look at (visualize) correlations between variables

Create new playlist

Sign In

Sign Up

Using pairs.panel() to look at (visualize) correlations between variables

Getting ready

How to do it...

How it works…

Note

There's more…

Table of Contents for
Using pairs.panel() to look at (visualize) correlations between variables