ggplot2 – first steps

ggplot2 (an acronym for Grammar of Graphics plot) is the most popular graphical package in R. This relies mainly on the fact that almost anything can be drawn with it. This huge flexibility, however, implies more complexity while drawing.

The underlying concept of ggplot2 is an empty canvas. Instead of specifying the type of plot and the data to be visualized, the ggplot2 functions expect vectors that denote positions, widths, sizes, and so on. From its conceptual point of view, this is very similar to an HTML document; this is an empty space that is filled with different objects to which different characteristics are specified. The following example is the equivalent of plot(iris$Sepal.Length):

ggplot.graph <- ggplot(data=iris)
ggplot.graph <- ggplot.graph + geom_point(aes(1:150,Sepal.Length))

plot(ggplot.graph)

This example is a typical ggplot code. As the reader might have already realized, its construction differs significantly from the other packages seen so far. In this section, the logic underlying ggplot2 will be explained and the functionality of the different elements in the following example will be clarified.

By this stage, it is only necessary for the reader to understand that when the points are specified with geom_point(aes(1:150, Sepal.Length)), the X and Y coordinates have to be specified. In this example, the position along the horizontal axis is defined as a vector from 1 to 150 (the number of cases in the iris dataset) while the position along the vertical axis is defined by the Sepal.Length vector.

ggplot's main logic – layers and aesthetics

Layers and aesthetics are two key elements within ggplot. There is no possibility of using this package in all its potential without a solid understanding of these two aspects.

Layers

The ggplot objects were previously described as empty canvases. This definition is, however, incomplete; the ggplot objects are empty canvases that are filled by layers. It is, in fact, the superposition of layers that makes ggplot so flexible. Let's examine the previous example again:

ggplot.graph <- ggplot(data=iris)
ggplot.graph <- ggplot.graph + geom_point(aes(1:150,Sepal.Length))

plot(ggplot.graph)

Firstly, a ggplot object is initialized. This is important to take into account that in order to create visualizations in ggplot2, a ggplot object must be created first. After this, layers can be added. In this example, only one layer consisting of points (the geom_point() function) is added to the object. The addition of layers to a ggplot object is mostly performed by the + operator (there are other options too, but they are rarely used).

At this stage, it is important to understand the difference between a ggplot object and a layer; the first one is the canvas and the second is the one with the elements displayed in it. So, their behaviors are different. The following points describe some of these behaviors, and consequently, what can and cannot be done with each of them:

  • The addition can be performed only between the ggplot objects and the layer objects, that is, the following operations are not permitted:
    • ggplot() + ggplot()
    • geom_point() + geom_point()
  • The layer objects cannot be plotted. That is, operations such as plot(geom_point(aes(1:150,1:10))) are not permitted.
  • The addition of a ggplot object and a layer object is a ggplot object with a layer. This means that more than one layer can be added to the same ggplot object, or intermediate ggplot objects can be built with successive layers:
    ggplot.graph0 <- ggplot(data=iris)
    ggplot.graph1 <- ggplot.graph0 + geom_point(aes(1:50,Sepal.Length[1:50],colour="red"))
    ggplot.graph2 <- ggplot.graph1 + geom_point(aes(51:100,Sepal.Length[51:100],colour="blue"))
    
    plot(ggplot.graph2)
    

Aesthetics

Aesthetics in ggplot2 is where the parameters of the elements to be displayed are specified, such as position, color, and size. The most commonly used function to build this is aes(), but in some cases, aes_string() or aes_q() might be more useful.

These functions can be called either in ggplot() or in any of the layers. It is important to consider that some of the arguments that can be passed to aes() might not be applied to the object passed. For example, specifying linetype in aes() for the geom_point() function does not have any effect. In the documentation, it is specified which aesthetics arguments are understood by each of them. However, it is important to take into account that these mistakes will not throw an error.

When adding layers, the parameters passed to aes are inherited, replaced, or overridden if they are not specified. This will depend on the addition type used. In the most common case (the + operator), the parameters specified in the layer will be overridden. However, this override is only valid in the layer specified. This means that if a new layer is added, it will not inherit the aes characteristics of the previous layer but of the ones from the ggplot object. Let's examine the following example:

library(googleVis)
ggplot.graph <- ggplot(data=iris)
ggplot.graph <- ggplot.graph + geom_point(aes(1:50,Sepal.Length[1:50],colour="red"))
ggplot.graph <- ggplot.graph + geom_point(aes(51:100,Sepal.Length[51:100]))

plot(ggplot.graph)

In this example, the data argument is defined in the ggplot call. As this is not overridden, every variable referred in the ggplot object is bound to the dataset specified. This is the reason why the y position is written simply as Sepal.Length. However, as the color in the second layer is not specified, the default (black) is taken, that is, the object does not inherit the color from the previous layer.

Some arguments can be passed both inside or outside aes(). In fact, the preceding example does not use the coloring attribute properly; the arguments passed to aes() are intended to have a meaning in the data. This is the reason why they are normally not a constant but a variable in the dataset. In fact, every aspect argument referring to some visual characteristic of the plot passed to aes() will be automatically included in the legend.

If the aesthetic change does not have any meaningful information regarding the data that is being displayed (for example, changing the color of the points in a geom_point() graph due to design), it is always better to pass it outside aes() as this will not create extra meaningless references.

Some graphical tools in ggplot2

In this section, only a few layers will be explained just to illustrate concrete examples of ggplot2. However, it is strongly advised to investigate more deeply about possibilities of ggplot2 if the reader wants to include these visualizations in their applications. In the later sections, the following functions will be covered:

  • geom_point
  • geom_line
  • geom_bars

geom_point

geom_point() draws points whose positions are specified by the x and y arguments inside aes(). In the following example, the positions are defined by Sepal.Length and Sepal.Width. Additionally, color is specified by Species:

library(googleVis)
points.graph <- ggplot(data=iris)
points.graph <- points.graph + geom_point(aes(x=Sepal.Length,y=Sepal.Width, colour=Species))
plot(points.graph)

This is the expected outcome:

geom_point

geom_line

geom_line() draws a continuous line that binds each subsequent coordinate pair given by x and y in aes(). If group is specified, then the lines are drawn by groups. In the preceding example, the x coordinate is defined by Sepal.Length while the y coordinate is given by Sepal.Width. As the grouping is already specified as Species in the ggplot call, three lines (one per group) are drawn. The color is also defined by Species:

library(googleVis)
line.graph <- ggplot(data=iris, aes(group=Species))
line.graph <- line.graph + geom_line(aes(x=Sepal.Length,y=Sepal.Width, colour=Species))
plot(line.graph)

This is the expected outcome:

geom_line

geom_bars

geom_bars() builds a layer consisting of bars. Its only mandatory argument in aes() is x. Unlike geom_point() or geom_lines(), the x argument of geom_bars() expects a factor (or a variable that can be coerced to be this). The function calculates the frequency per category and returns the corresponding bar plot. In the following example, a barplot is build for the variable class in the mpg dataset:

library(googleVis)
bar.graph <- ggplot(data=mpg)
bar.graph <- bar.graph + geom_bar(width=0.3, fill="red", aes(x=class))
plot(bar.graph)

This is the expected outcome:

geom_bars

An applied example with multiple layers

One of the most interesting capabilities of ggplot2 is the possibility of combining layers. In the following code, a very simple plot is created that consists of two layers, a geom_point() that draws all the points corresponding to the second anscombe series (the x2 and y2 variables respectively), and a line between the first and the last observation.

Note

The anscombe quartet is an artificial dataset originally conceived by Francis Anscombe where each x-y set has the same mean and variance for both variables, the same correlation coefficient, and the same regression equation. However, by plotting them, it becomes clear that they are all very different. In R, this is available by default as iris.

The data is sorted before so that the first and last observations match with the first and last values of x in the plot:

library(googleVis)
data(anscombe)
sorted.anscombe <- anscombe[order(anscombe$x2),]
anscombe.graph <- ggplot(data=sorted.anscombe)
anscombe.graph <- anscombe.graph + geom_point(colour = "blue",aes(x=x2,y=y2))
anscombe.graph <- anscombe.graph + geom_segment(colour = "red",aes(x=x2[1],xend=x2[nrow(anscombe)],
y=y2[1],yend=y2[nrow(anscombe)]))

plot(anscombe.graph)

This is the expected outcome:

An applied example with multiple layers

ggplot and Shiny

When integrating ggplot2 in Shiny, there is probably one main issue to consider: aes() accepts expressions (that is, unquoted variable names) as arguments but there is no widget in Shiny that can produce an input value of that class. For this reason, in most of the cases, it will be advisable to instead use aes_string(), which can receive character arguments.

The following server.R file is an equivalent in ggplot for the server.R file of the graphics Shiny example in the Including a plot in a Shiny application section. As ggplot already draws the legends and re-adjusts the scales automatically, the code ends up being much clearer than in the first case. As it was already explained, aes_string() is used instead of aes() because input$xvar and input$yvar are character values. ggtitle() is, as expected, a layer that adds a title to the ggplot.

By replacing server.R from the first example with the following code, an equivalent visualization will be obtained with the sole difference that the plot will have a ggplot look and feel:

library(shiny)

#initialization of server.R
shinyServer(function(input, output) {

  iris.sset <- reactive(subset(iris,Species %in% input$species))
  
  #Plot generation
  output$custom.plot <- renderPlot({
    iris.ggplot <- ggplot(data=iris.sset())
    iris.ggplot <- iris.ggplot + geom_point(aes_string(input$xvar, input$yvar, colour="Species"))
    iris.ggplot <- iris.ggplot + ggtitle(paste0(input$xvar,"/",input$yvar," dispersion graph"))
    plot(iris.ggplot)
    
  })
  
})
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset