Hour 13. Graphics


What You’ll Learn in This Hour:

Image How to use graphics devices

Image High-level graphics functions

Image Low-level graphics functions

Image Graphical parameters

Image How to control the device layout


After all the manipulations to our data, we want to be able to start to do something with it. In this hour, we look at how to create graphics using the base graphics functionality. You may be aware that there are other packages for creating graphics, including ggplot2 and lattice, which we will look at in the next two hours. Here, however, we look at some of the basics, including how to send graphics to devices such as a PDF and the standard graphics functions. Finally, we look at how to control the layout of graphics on the page.

Graphics Devices and Colors

Before we start to create graphics, we need to think about where we will create them and how we will color them. In this section, you learn how to control the device that is used to create the graphic, whether this is the default plot device or a specific file type. You will also see the options for defining color in R graphics.

Devices

Whenever we create a graphic in R, it is returned to a device. This may be the RStudio Plot tab or it may be a physical file, such as a PDF, that we want to return to. A number of graphics devices are available, including PDF, PNG, JPEG, and bitmap. If we do not specify the device, the default device will be opened, and in RStudio this is the Plot tab.

If we want to create a graphic in a specific device, we do so by first creating that device. We create devices with a series of functions that take the name of the file type (for instance, pdf or png). This opens a connection between R and the device, and any graphics we now create will be written to that file. A vital step is to then close the device using the function dev.off. As an example, let’s create a graphic in a PDF file that we will name myFirstGraphic.pdf:

> pdf("myFirstGraphic.pdf")
> hist(rnorm(100))
> dev.off()  # remember to close the device!

In our current working directory we will now have the PDF file myFirstGraphic. We can, of course, give the full file path to an alternative location to save our device. Attributes of the device, such as width, height, and resolution, can all be set in the specific device functions.


Tip: Closing Graphics Devices

When you start to create graphics in devices in this way, you may find that you have unintentionally opened a number of devices and you are not certain where the graphic is being written to anymore. If this happens, try using the function graphics.off, with no arguments. This will close all active devices and allow you to start again with creating your graphic.


Colors

When it comes to specifying colors in R, we have a few options. The easiest is to simply name the color. To know what colors we can name in this way, we can use a function in R called colors (or colours) that will return a vector of all the colors that R recognizes by name. Here’s an example:

> sample(colors(), 10)
[1] "wheat3"     "lightblue1" "wheat"      "olivedrab1" "lightblue4" "grey11"
[7] "peru"       "grey39"     "firebrick2" "peachpuff4"

Alternatively, we can provide the exact hexadecimal value for the color we want to use. For instance, #FF0000 is the hexadecimal value for red. If you are not certain of the hexadecimal value but do know the red, green, and blue color values, you can use the rgb function to help you out. For example, here’s how to find the hexadecimal value for green:

> rgb(0, 255, 0, maxColorValue = 255)
[1] "#00FF00"

High-Level Graphics Functions

Graphics functions in the base graphics package are split into two types. High-level functions are those that allow us to create the graphic. Low-level functions allow us to add content, such as points and lines to an existing graphic. In this section, we look at the high-level functions available to us. These have been split into univariate graphics and the plot function. We also look at how to control aesthetics and the type of plot we create.

Univariate Graphics

In this section, we look at graphics that we may create with a single variable. This includes histograms, boxplots, and bar charts, as well as QQ plots. Throughout this section we use simple vectors of simulated values to plot.

To start with, let’s look at histograms and QQ plots. Both are very simply created by passing a vector of data to the appropriate function, hist or qqnorm. In the case of the QQ plot, if we want to add a QQ line, we need to additionally use the function qqline.

> x <- rnorm(100)
> hist(x, col = "lightblue")
> qqnorm(x)
> qqline(x)

In all these functions there is an argument, col, that allows us to set the color, as can be seen in the preceding hist example. The graphics that these calls generate can be seen in Figure 13.1.

Image

FIGURE 13.1 Examples of the default histogram and QQ plot, with corresponding QQ line

For boxplots, again we can simply provide a vector of the data we want to plot. Here’s how:

> boxplot(x)

If, however, we want to plot the data split by another variable, we would need to provide a formula for that representation. As an example, we will create a new vector that is simply a random sampling of values from "F" and "M" to assign a gender to each value in the vector x. We then want to plot the data x split by the corresponding gender we have sampled.

> gender <- sample(c("F", "M"), size = 100, replace = TRUE)
> boxplot(x ~ gender)

The two graphics generated here can be seen in Figure 13.2. In the case where we have the data stored in a data frame, we can simply provide the variable names and then specify the dataset with the data argument. Here’s an example:

> genderData <- data.frame(gender = gender, value = x)
> boxplot(value~gender, data = genderData)

Image

FIGURE 13.2 A simple univariate boxplot and boxplot split by a second variable, in this case gender

The final example to consider is the barplot function. This allows us to create a bar chart where the heights of the bars are based on the values given by the vector input. Consider this simple example of a vector of just three elements:

> barplot(c(3, 9, 5))

This bar chart is shown in Figure 13.3. There are additional options for giving names to each of the bars, for instance, and for coloring the bars, as you have seen for other plots. This function also works well with the table function you saw in Hour 6, “Common R Utility Functions.” Consider the gender vector that we created. Suppose we want to count the number of cases of each gender and generate a bar chart showing these counts:

> genderCount <- table(gender)
> barplot(genderCount)

Image

FIGURE 13.3 Bar charts created from a single vector and a named vector, the output of the table function

This is also shown in Figure 13.3. You will notice that in this case the bars are already named. This is because the output from the table function is a named vector, so the names of the categories in the data are passed through to the barplot function to label the bars.

The plot Function

The main function you will use for generating graphics is the plot function. As you will see, this is a very versatile function and can be used to easily generate diagnostic plots for models. In this hour we use it only to plot vectors of data.

Let’s start with just a single vector of data. In this case, just as with the preceding univariate graphics, we can simply pass the vector to the plot function:

> plot(x[1:10])

This plot is shown in Figure 13.4, where you can see that in this instance the values of the vector are plotted against the Y axis. On the X axis we have the index of the position of the element in the vector.

Image

FIGURE 13.4 Using plot for a single vector. Here, the values in the vector are plotted against their index, or position in the vector

When it comes to plotting two variables, we need to give the X and Y axis variables in that order. So the first argument to plot is the vector of values on the X axis, and the second is the vector of values on the Y axis. Therefore, let’s create a plot using the airquality data. In this instance, we are going to plot Ozone against Wind, so we want the Wind vector on the X axis and Ozone on the Y axis:

> plot(airquality$Wind, airquality$Ozone, pch = 4)

In this example, the result of which can be seen in Figure 13.5, we have also changed the plotting symbol, which you will see in more detail in the next section. You will notice that this has, by default, added axis labels that are simply the names of the objects we passed and that there is no title. All of these things, which contribute to the appearance of the plot, we will look at in the next section.

Image

FIGURE 13.5 Using plot to create a bivariate scatterplot. Here, we have also changed the plotting symbol

Aesthetics

For all of the plotting functions that we have looked at in this hour, there are a number of arguments we can use to change the way that the plot looks. This could be adding a title, changing the point styles, or adding the correct axis labels. In this section, we discuss how to do all these things.

Titles and Axis Labels

We need three arguments to change the main title of the plot along with the X and Y axis labels:

Image main, for controlling the plot title

Image xlab, for setting the X axis label

Image ylab, for setting the Y axis label

We can use these arguments in all the plotting functions from this hour:

> hist(x, main = "Histogram of Random Normal Data", xlab = "Simulated Normal Data")
> require(mangoTraining)
> plot(pkData$Time, pkData$Conc,
+      main = "Concentration against Time", xlab = "Time",
+      ylab = "Concentration")

The plots for these examples are shown in Figure 13.6, where you can see we now have more appropriate titles and axis labels.

Image

FIGURE 13.6 Changing titles and axis labels in both histograms and scatterplots


Tip: Including Special Characters

If you want to include special characters, such as Greek letters, in your titles and axis labels, you will need to use the expression function. As an example, the axis label may become this:

ylab = expression("Concentration ("*mu*"g/ml)")

Here, we are using the asterisk (*) to combine strings with the Greek character mu.


Axis Limits

The default behavior of the plot function is to set the range of the plot limits to cover the range of the data. In some instances this is sufficient; however, often this will not be suitable for the data in question—for instance, if the axis limits need to extend to zero. In this case, we need to make use of the arguments xlim and ylim.

Both of these arguments are provided in the same way. We need to give a single vector of length two. The first element of this vector is the minimum value for the axis and the second value is the maximum value for the axis. As an example, suppose we want to extend the maximum value of both axes in the Concentration against Time plot:

> plot(pkData$Time, pkData$Conc, xlim = c(0, 50), ylim = c(0, 3000))

The plot that is created by this code is shown in Figure 13.7. This functionality is particularly useful if we want to plot a subset of the data across the range of the full dataset. For instance, suppose we want to plot the Dose 25 data from the pkData dataset but with the axes based on the complete data:

> plot(pkData$Time[pkData$Dose == 25], pkData$Conc[pkData$Dose == 25],
+        ylim = range(pkData$Conc))

Image

FIGURE 13.7 Changing axis limits

This plot can also be seen in Figure 13.7, and you can see how we have used the range function from Hour 6 to determine the minimum and maximum values of the Y axis of the plot.

Plotting Symbols

In the graphics that we have created so far, we have mostly left the plotting symbol as the default, black, unfilled circle, although Figure 13.5 showed that we can change the symbol itself using the argument pch, and Figure 13.1 showed we can change color using the col argument.

You can change the plotting symbol by providing a numeric value to indicate the symbol you want to use. Figure 13.8 shows symbols 0 to 20. Additionally, a series of other symbols takes values in the region 21 to 25 (see Figure 13.9). The difference with these symbols is that, in addition to being able to set the color, we can also set the fill. The fill of the shapes is actually set with the argument bg, but just like with the argument col, we can give any color value.

Image

FIGURE 13.8 Plotting symbols and their values

Image

FIGURE 13.9 Plotting symbols 21 to 25 with just the col argument set (bottom) and with col and bg set (top)

As well as setting the color and shape of the symbols, we can also set the size. We do this with the argument cex. This argument is simply a numeric value indicating how many times bigger (or smaller) than the usual size we want our points. The default is 1.

The following example shows how we can create a graphic where all these arguments are set. Notice that we are using the plotting symbol 24, which allows us to use the bg argument:

> plot(pkData$Time, pkData$Conc,
+      main = "Concentration against Time", xlab = "Time",
+      ylab = "Concentration", pch = 24, col = "navyblue",
+      bg = "yellow", cex = 2)

You can see the graphic that is created from this code in Figure 13.10.

Image

FIGURE 13.10 Updating the plotting symbol and its attributes

Plot Types

Clearly it is very simple to create scatterplots of our data, but what about alternative plot types? You haven’t yet seen a line plot or step plot. How about lines and points? We can switch our plot to any of these graphics by using the type argument. We pass to the type argument one of a series of letters. The default is p, to indicate points, but we can also have l, o, and s, to name a few. The complete set of options is given in Table 13.1, and a series of graphics showing different types when plotting the same random 10 points is shown in Figure 13.11. Generating graphics of this type would look something like this:

> x  <- rnorm(100)
> plot(x, type = "l", main = 'type = "l"')

Image

TABLE 13.1 Available Plot Types

Image

FIGURE 13.11 Setting the plot type

It is probably worth noting that just as we can style the points, as you saw in the previous section, we can also style lines. The argument lty lets us set the line type and again takes integer values. The argument lwd allows us to set the line width in the same way that we set the point size using cex. We will look at examples of setting line types in the next section.

Low-Level Graphics Functions

So far you have seen only the high-level graphics functions available in the base graphics package. This package has allowed us to create an entire plot. Often we will want to add a component to the graphic—such as lines showing the mean and confidence intervals, or text to identify an outlier. For this we need the low-level graphics functions. All the functions you will see in this section add a component to the existing graphics device rather than creating a new plot device. This is where the type = "n" option you saw in the previous section is particularly useful.

Points and Lines

We will start by adding simple points and lines to our graphics. For this we will use the functions points and lines. Just as with the plot function, these functions add points at the X and Y locations specified, or join the locations together in the case of lines. Just as with the plot function, therefore, the first two arguments are the vector of x values and the vector of y values. As an example, let’s take the first and second subjects from the pkData. On a single plot we will add the points to show subject 1 and a line to show subject 2:

> subject1 <- pkData[pkData$Subject == 1, ]
> subject2 <- pkData[pkData$Subject == 2, ]
> plot(pkData$Time, pkData$Conc, type = "n")
> points(subject1$Time, subject1$Conc, pch = 16)
> lines(subject2$Time, subject2$Conc)

The resulting plot is shown in Figure 13.12. The lines function shown here has simply connected together supplied X and Y points. What if we wanted to add a straight line that shows the median concentration value, or the time when the maximum occurs, or even some form of trend? In this case, we would use the function abline. The default behavior of this function is to add a line based on an intercept and slope. However, we can also use the arguments h and v to add horizontal and vertical lines. So, here’s how to add the median concentration and the time of the maximum concentration:

> abline(h = median(pkData$Conc), lty = 2)
> abline(v = pkData$Time[pkData$Conc == max(pkData$Conc)], lty = 3)

Image

FIGURE 13.12 Adding points and lines to a plot

Text

The ability to add text to a graphic is incredibly useful. It may be that you actually want to use text as the plotting symbol itself but more often than not it will simply be that you want to label a particular point, typically an outlier. We would perform all of these tasks with the text function. Another low level function, this will allow us to add information to an existing plot and it doesn’t matter if this was created using only a high level function or a combination of high and low level functions as we saw in the last section.

To start with, we will use the text function to add all of the content of our plot, using text as the plotting symbol. Just as other plot functions, the first two arguments are the vectors of the X and Y location for the points. The third argument to this function is then the text that we want at each location. This is typically a vector of the values for each X, Y pair. So if we were to plot the Concentration against Time plot of the pkData, using the Dose as the text to plot, it might look something like this:

> plot(pkData$Time, pkData$Conc, type = "n")
> text(pkData$Time, pkData$Conc, pkData$Dose)

This graphic is shown in Figure 13.13, and as you can see the doses appear as text on the plot. A more effective use of this function is to label specific points. We can use the text function in a very similar way with the X and Y location along with the text, but as you will notice, this centers the text on the location. If you also have a point here, this is a problem because the text will be obscured. You can, of course, manually adjust the X or Y location to handle this, though the text function includes a number of arguments for controlling the positioning. One argument, adj, lets us specify an X and Y adjustment for the text. We can also use the arguments pos and offset. The pos argument lets us control which side of the point to position the text and takes a value from 1 to 4, with 1 being the bottom, 2 to the left, 3 above, and 4 to the right. The offset argument is used in conjunction to determine how far away from the point to center the text.

Image

FIGURE 13.13 Using the text function to plot text or add text labels

As an example of using text in this way, we can consider labeling the maximum value at each time point, except 0, with the Subject number. Here, we are using the dplyr package to retain only the rows of data that correspond to the maximum concentration, and then we are using the text function to plot the Subject label to the right of the corresponding points. This graphic can be seen in Figure 13.13.

> library(dplyr)
> maxData <- filter(group_by(pkData, Time), Conc == max(Conc), Time != 0)
> plot(pkData$Time, pkData$Conc, pch = 16)
> text(maxData$Time, maxData$Conc, maxData$Subject, pos = 4, offset = 0.5)

Legends

Adding a legend to a graphic created with any of the base graphics functions requires us to use the low-level legend function. It can initially seem like a confusing function to work with, but in reality it is not too confusing if you remember to always give the groups in the same order as the text on the legend itself.

The first argument to this function is either an X and Y location for the position of the top-left corner of the legend or a single string of the form "topright" or "bottomleft", among others. A full list is available in the help file for the legend function.

We then need to specify the legend text. To the argument legend we pass a vector of character strings that will appear as the labels on the legend—for instance, legend = c("Subject 1", "Subject 2"). We can give the text in any order we want the groups to appear. The only thing we need to remember is that when we specify colors, points, and so on, we need to maintain this ordering.

In addition to the location and the legend text, we can then provide vectors of the values for any parameters we want to change. For instance, if we have set the color for each group, we may want to pass a vector of colors to the col argument. If we have changed the plotting symbol for each group, we may want to pass a vector of the plotting symbols—again, remembering for each to maintain the ordering we gave in the text.

As an example, suppose we want to add a legend to the pkData plot, where subject 1 is plotted with blue filled circles and subject 2 is plotted with red, unfilled squares:

> subj1 <- pkData[pkData$Subject == 1, ]
> subj2 <- pkData[pkData$Subject == 2, ]
> plot(subj1$Time, subj1$Conc, pch = 16, col = "blue")
> points(subj2$Time, subj2$Conc, pch = 0, col = "red")
> legend("topright", legend = c("Subject 1", "Subject 2"),
+     pch = c(16, 0), col = c("blue", "red"))

This graphic is shown in Figure 13.14, and you can see that in this case the legend has been pushed into the very top-right corner and sized appropriately based on the legend text provided.

Image

FIGURE 13.14 Adding a legend to a graphic


Note: Arguments to the legend Function

You will have noticed in the example that the arguments used were the same as those in the plot and points functions. For many of the graphics parameters, this will be the same. However, take care because some, such as cex, will actually change the legend itself. You can still change the size of the points in the legend, but you will need the argument pt.cex instead. Much more information is available in the help file.


Other Low-Level Functions

In addition to the low-level functions you have seen in this section, a few others are available. We will not go through them all here, but Table 13.2 lists many of the functions you may be interested in. This includes functions for controlling the title, text in the margins, and the axes.

Image

TABLE 13.2 Low-Level Graphics Functions

Graphical Parameters

In the graphics we created in this hour, we have set any parameters related to the graphics in the plotting functions. We can also set these inside a function called par. The par function actually returns a list that contains the settings for graphics parameters. This not only includes arguments such as col and pch, but also mar for setting the margins and xpd, which allows us to add graphics content outside of the figure region.

When it comes to setting margins for our graphic, it is useful to know how a graphics device in R is split. Figure 13.15 shows the sub-regions of a device, including the outer margins and the figure region. You will notice that the par function includes arguments for the outer margin. You may want to alter this when you have multiple graphics in one device, as you will see in the next section, because they all share an outer margin.

Image

FIGURE 13.15 Regions in a graphics device

For all the options that can be set in the par function, their usage, and their default values, the help documentation is an invaluable resource.

Controlling the Layout

Once we are able to create all the graphics we are interested in, we typically want to think about how we present that information. When we looked at creating a graphics device, we said that a PDF file would allow us to create a single, multipage document of all our individual plots. In this section, we look at options for creating a single page containing multiple graphics.

Grid Layouts

The simplest layout of our graphics is in a grid-like structure, where we have a specified number of rows and/or columns of graphics. We can set up a graphics device to have the format by using the mfrow option to the par function. This argument takes a vector of the number of rows and columns into which our device should be split. When we then create graphics, they will be entered into the device across the rows, starting in the top left of the grid.

As an example, suppose that we have some random data that we want to plot as a histogram, boxplot, QQ plot, and against its index. We may want to set this up as a 2×2 plot area, like so:

> par(mfrow = c(2, 2))
> x <- rnorm(100)
> hist(x)
> boxplot(x)
> qqnorm(x)
> plot(x)

The graphic that this generates can be seen in Figure 13.16. Once set, this layout of graphics will be maintained. We can revert to the default by setting the mfrow argument to c(1, 1).

Image

FIGURE 13.16 Splitting up the plot region using mfrow

The layout Function

For much finer control of the layout of our graphics we can use the layout function. As well as being able to control the width and height of each of the columns in our graphics device, we have much finer control of which regions a graphic appears in.

The main argument for this function is a matrix that specifies the locations for each graphic. Each graphic is represented by an integer value and appears in the grid in all regions where that value appears. As an example, suppose we want to plot four graphics, as in the previous section, but we want the first histogram to take up the entire first row and the other three graphics to appear underneath in one row. In that case, we would create the following matrix:

> mat <- rbind(1, 2:4)
> mat
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    2    3    4

Thus, the first graphic would fill all cells containing the value 1—in this case, the entire first row. The second graphic would appear in the position of the 2, and so on. To set this as our layout, we pass it to the layout function, followed by the graphics in order:

> layout(mat)
> x <- rnorm(100)
> hist(x)
> boxplot(x)
> qqnorm(x)
> plot(x)

The result is shown in Figure 13.17. Clearly this gives us a large amount of flexibility over which graphics appear where and their size. If you don’t want a region to include a graphic, you can set the value in the matrix to 0. To see the layout you have specified, use the layout.show function. This will generate a graphic showing the specified layout.

Image

FIGURE 13.17 Splitting up the plot region using layout


Tip: Finer Control of the Layout

We can control the appearance of the layout further by using the widths and heights arguments to the layout function. We simply need to provide a vector the same length as the number of columns (for widths) or rows (for heights) specifying the sizes.


Summary

In this hour, you saw how to create graphics using the base R functionality. Functions for graphics are split into two: The high-level functions create a whole plot, and the low-level functions allow us to add components to an existing graphic. The base graphics package is not the only option for graphics, and in the next two hours you will see how to create graphics using the ggplot2 and lattice packages.

Q&A

Q. Why isn’t my plot appearing in the Plot tab?

A. This is usually because you have an open connection to a graphics device other than the default Plot tab in RStudio. In that case, your graphics are being written to an alternative graphics device. You can use the function dev.off to close the current connection, but if you are not sure how many graphics devices you have open, try graphics.off. This will close all active devices, and you can start again.

Q. The argument bg isn’t changing anything in my graphic. What am I doing wrong?

A. What plotting symbol are you using? The argument bg is only compatible with plotting symbols in the range 21 to 25. If you are using any other symbol, this argument won’t change anything about your graphic.

Q. How can I remove lines or points after I have added them with the low-level functions?

A. The approach taken by R in drawing graphics with the base graphics functions is similar to a pen-and-paper approach. If you want to remove a component, you will need to run the code again, excluding the component you don’t want anymore.

Q. I changed the layout of my device and now I just want to see one plot. How can I change it back?

A. You can change the layout back to the default (one row, one column) by setting the argument mfrow of the par function to c(1, 1).

Q. Can I put the legend outside of the plot region?

A. Yes, you can. You will need to extend the margins and set the argument xpd (in the par function) to NA to allow you to draw in the margins.

Workshop

The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.

Quiz

1. What is a device and why do you need to set one?

2. Which functions allow you to create the following graphics?

A. A QQ plot with corresponding line

B. A bar chart of counts

C. A plot of a variable against another

D. A histogram

3. What effect would setting pch = 6 have on a scatterplot?

4. Which low-level graphics function can you use to add text to the margins?

5. When would you use the mfrow argument of the par function and when would you use the layout function?

Answers

1. A device is what your graphic is created in. This could be the default RStudio device or a specific file type, such as PDF or PNG. If you want to use a device that is not the default device, you need to set it. You use a function such as pdf or png to set the device and dev.off to close the connection.

2. You would need the following functions:

A. qqnorm and qqline

B. barplot

C. plot

D. hist

3. It would change the plotting symbol to an upside-down triangle.

4. To add text in the margins, you would need to use the mtext function.

5. You would use both to change the layout of a device to include multiple graphics in a single device. The mfrow argument is sufficient if you want the graphics to be in a grid layout with a specified number of rows and columns. The layout function gives you much more control over exactly where graphics should appear and the widths and heights of rows and columns.

Activities

1. Sample 100 values from a Normal distribution. Create a histogram of this data.

2. For each month in the airquality data, create a plot of Ozone against Wind. Ensure that all the plots are on the same axis and include a suitable title that indicates the month—for example, “Ozone against Wind for Month X.”

3. Create a five-page PDF document from the graphics in the previous exercise.

4. Create a single-page PNG file that includes all five graphics created in Activity 2. Choose a suitable layout to show the data.

5. Create a single graphic of Wind against Day, where each month is a single line, each in a different color. Add a legend to the graphic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset