Data visualization

One of the powerful features of R is its functions for generating high-quality plots and visualize data. The graphics functions in R can be divided into three groups:

  • High-level plotting functions to create new plots, add axes, labels, and titles.
  • Low-level plotting functions to add more information to an existing plot. This includes adding extra points, lines, and labels.
  • Interactive graphics functions to interactively add information to, or extract information from, an existing plot.

The R base package itself contains several graphics functions. For more advanced graph applications, one can use packages such as ggplot2, grid, or lattice. In particular, ggplot2 is very useful for generating visually appealing, multilayered graphs. It is based on the concept of grammar of graphics. Due to lack of space, we are not covering these packages in this book. Interested readers should consult the book by Hadley Wickham (reference 4 in the References section of this chapter).

High-level plotting functions

Let us start with the most basic plotting functions in R as follows:

  • plot( ): This is the most common plotting function in R. It is a generic function where the output depends on the type of the first argument.
  • plot(x, y): This produces a scatter plot of y versus x.
  • plot(x): If x is a real value vector, the output will be a plot of the value of x versus its index on the X axis. If x is a complex number, then it will plot the real part versus the imaginary part.
  • plot(f, y): Here, f is a factor object and y is a numeric vector. The function produces box plots of y for each level of f.
  • plot(y ~ expr): Here, y is any object and expr is a list of object names separated by + (for example, p + q + r). The function plots y against every object named in expr.

There are two useful functions in R for visualizing multivariate data:

  • pairs(X): If X is a data frame containing numeric data, then this function produces a pair-wise scatter plot matrix of the variables defined by the columns of X.
  • coplot(y ~ x | z): If y and x are numeric vectors and z is a factor object, then this function plots y versus x for every level of z.

For plotting distributions of data, one can use the following functions:

  • hist(x): This produces a histogram of the numeric vector x.
  • qqplot(x, y): This plots the quantiles of x versus the quantiles of y to compare their respective distributions.
  • qqnorm(x): This plots the numeric vector x against the expected normal order scores.

Low-level plotting commands

To add points and lines to a plot, the following commands can be used:

  • points(x, y): This adds point (x, y) to the current plot.
  • lines(x, y): This adds a connecting line to the current plot.
  • abline(a, b): This adds a line of the slope b and intercepts a to the current plot.
  • polygon(x, y, …): This draws a polygon defined by the ordered vertices (x, y, …).

To add the text to a plot, use the following functions:

  • text(x, y, labels): This adds text to the current plot at point (x, y).
  • legend(x, y, legend): This adds a legend to the current plot at point (x, y).
  • title(main, sub): This adds a title main at the top of the current plot in a large font and a subtitle sub at the bottom in a smaller font.
  • axis(side, …): This adds an axis to the current plot on the side given by the first argument. The side can take values from 1 to 4 counting clockwise from the bottom.

The following example shows how to plot a scatter plot and add a trend line. For this, we will use the famous Iris dataset, created by R. A. Fisher, that is available in R itself:

data(iris)
str(iris)
plot(iris$Petal.Width, iris$Petal.Length, col = "blue", xlab = "X", ylab = "Y")
title(main = "Plot of Iris Data", sub = "Petal Length (Y) Vs Petal Width (X)")
fitlm <- lm(iris$Petal.Length ~ iris$Petal.Width)
abline(fitlm[1], fitlm[2], col = "red")
Low-level plotting commands

Interactive graphics functions

There are functions in R that enable users to add or extract information from a plot using the mouse in an interactive manner:

  • locator (n , type): This waits for the user to select the n locations on the current plot using the left-mouse button. Here, type is one of n, p, l or o to plot points or lines at these locations. For example, to place a legend Outlier near an outlier point, use the following code:
    >text(locator(1),"Outlier" ,adj=0")
  • identify(x, y, label): This allows the user to highlight any of the points, x and y, selected using the left-mouse button by placing the label nearby.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset