Axis scales

Together with axis labels, you will also often need to change the axis scales. For instance, you may want to change the scale in log values or modify the default range values included in the axis when creating the plot. In this section, we will take a look at exactly how to do these modifications. The axis scales as well as the legends are derived from the scales used in aesthetic mappings, so, in many cases, if you want to manipulate such values, you will need to use the scale function relative to your specific situation. For this reason, we will treat the plots with only discrete scales as different from those with only continuous scales.

The discrete axis

You may have a plot with discrete scales, for instance, when you represent data grouped in categories along one of the axes. As an example, we will use the dataset with the four different normal distributions that we created in the previous chapter with the next command. In this case, we will just directly define the grouping variable as a factor so that we don't need to convert numbers to factors later on. The following command shows this:

dist <- data.frame(value=rnorm(10000, 1:4), group=factor(1:4))

Our dist dataset will contain four different normal distributions, and we will visualize them as boxplots with the different groups defined by the variable group along the x axis, as follows:

myBoxplot <- ggplot(dist, aes(x=group, y=value, fill=group)) + geom_boxplot()

The resulting plot is represented in Figure 5.2(A). In this case, we have an x axis composed of discrete data, so if we, for instance, want to change the order of such data, we can do that using the scale function for discrete data and apply it to the x axis aesthetic. We will then use the scale_x_discrete() function. In the upcoming examples, we will consider scale transformations to the x axis, but the same transformation can be applied to the y axis by replacing x with y in the function or argument names. The following command shows this:

myBoxplot + scale_x_discrete(limits=c("1","3","2","4"))

As illustrated in Figure 5.2(B), we have changed the order of the data to the order we have specified.

Tip

Reversing the order of discrete variables

We have seen how to manually set the order of discrete variables. You can also use this approach to invert the order of the variables, but, in case you are working on a dataset with many levels in the grouping variable, it may be handy to use the rev() function. This basic R function simply inverts the order of its elements, so using it in the limits argument will make the command shorter and easier to read. The following command shows how you can apply this method to invert the order of your variables; just remember that you need to specify the dataset explicitly since, in this case, you are actually using the vector of levels of your grouping variable directly:

myBoxplot + scale_x_discrete(limits = rev(levels(dist$group)))

In our previous examples, we specified the order of the discrete variables using the limits argument of the scale function. There are also other arguments that you can use in this function to change default values in the scale. These arguments will be very similar to other scale functions, so we will not list them for all scale functions we will mention in the next pages. You can always check them in the help page of the specific function you are interested in. For the scale_x_discrete() function, the common available arguments are the following:

  • name: This defines the name of the scale and so the label of the axis. Aesthetics that are used in the legend define the name of the legend.
  • breaks: This controls the breaks in the guide and so which values appear on the axis or legend. The value is NULL for no breaks.
  • labels: This defines the labels that should appear on the breakpoints defined by breaks; the value is NULL for no labels.
  • na.value: This is how missing values should be displayed, for instance, by providing the value that should be represented as replacement.
  • limits: This defines the limits of the data range and the default order of how they are displayed.
  • guide: This is used to control the legend.

In most cases, the arguments of the specific scale functions are passed to a more general function, which then actually constructs the scale. For this reason, you will not find a description of such arguments on the help page of the scale function but on the help page of the constructor function. On the help page of the scale function, you will find specified which scale constructor is used. In our previous examples, for instance, the scale_x_discrete() as well as scale_y_discrete() scale functions use the discrete_scale() scale constructor.

The discrete axis

Figure 5.2: A boxplot of distributions in the dataset dist with default settings (A) and with the changed order of the discrete variable on the x axis (B)

The continuous axis

When dealing with continuous scales, two very common adjustments on the scale you will probably need to make in some cases are modifying the default data range represented in the plot and inversing the direction of the data. As we have seen in the previous section, the scale functions provided the limits argument, which allowed us to set the limits of the axis. We can, for instance, change the limits of the y axis in the boxplot we just created by extending it from -10 to 10, as shown in the following command:

myBoxplot + scale_y_continuous(limits=c(-10,10))

As an alternative, you can also use the xlim and ylim functions if you only need to change the range, so, for instance, the following command will produce the same result:

myBoxplot + ylim(-10,10)

The resulting plot is shown in Figure 5.3(A).

If you want to make sure that a value in the range is included in your plot, you can also use the expand_limits() function. This function will increase the range plotted, making sure that all the values within the specified limits are included. For instance, if we want to make sure that the value -10 is represented in our plot, we can use this function, the resulting plot of which is represented in Figure 5.3(B):

myBoxplot + expand_limits(y=-10)

This function can be very handy since you only specify the values that should be included, so if you change the data in you plot or reuse part of your code, the limits are also be applied to the new plot. On the other hand, just keep in mind that the expand_limits() function cannot be used to shrink the range represented. Take the following command as an example:

myBoxplot + expand_limits(y=0)

The preceding command will not produce any change in our original plot since 0 is already included in the range plotted.

The continuous axis

Figure 5.3: A boxplot of distributions in the dataset dist with the y axis range from -10 to 10 (A) and the expanded range to include value the value -10 (B)

Axis transformations

By default, scales in plots are linear, but you have the option to replace this with a transformed scale for your axis. This can be done in several different ways, but the two main options are to transform the axis by changing the scale or changing the coordinate system. The result of such methods is slightly different since the transformation is applied at different points.

For our examples, we will use the cont dataset that we generated in the previous chapter with the following command, which contains three series of data values. The following command shows this:

cont <- data.frame(y=c(1:20,(1:20)^1.5,(1:20)^2), x=1:20, group=rep(c(1,2,3),each=20))

We will first create a scatterplot with these values (Figure 5.4(A)), and then we will transform our y axis into log10 values, as shown in the following command:

myScatter <- ggplot(data=cont, aes(x=x, y=y, col=factor(group))) + geom_point()
myScatter + scale_y_log10()

In the preceding command, we used the scale function to transform the axis; this function directly corresponds to the scale_y_continuous() function but with log transformation of the data. Other similar functions that are also available are scale_x_reverse(), which inverts the values on the axis, and scale_x_sqrt(), which calculates the square root. You can see the resulting picture from our transformations in Figure 5.4(B). Since, in this case, we have used the scale function, we have applied the transformation when creating the scale, so before that, properties such as breaks and ranges of data were created, and this means that the scale representing log-transformed data is done based on the newly transformed data. As mentioned earlier, we can also use coordinate transformation, but in this case, the transformation is applied, after which the scale is defined, which means that the scale that contained the original values is now represented on a log axis.

You can use coordinate transformation as shown in the following command and see the resulting plot in Figure 5.4(C).

myScatter + coord_trans(y="log10")
Axis transformations

Figure 5.4: Example of scatterplots with default linear scales (A), a log-transformed y axis by changing the scale (B), and a log-transformed y axis by changing the coordinate system (C)

As illustrated, independent of the transformation method used, the data is represented in the same way in Figure 5.4(B) and Figure 5.4(C), but the y axis scale is different; when transforming the scale, the axis contains log-transformed values, while, when changing coordinates, the values represented are the same as with the linear scale but represented in a log-transformed coordinate system.

Tip

Removing axis tick marks

In some cases, you may want to remove the axis tick marks since they may be redundant in your plot. For instance, if we look again at our dataset dist, and we plot only one distribution, we would end up with two labels: one for the data and one for the axis. In these cases, it may be handy to just delete the axis tick marks and use the axis label to define the type of data, as shown in the following command:

myBoxplot2 <- ggplot(subset(dist,group=="1"), aes(x=group, y=value, fill=group)) + geom_boxplot()
myBoxplot2 + scale_x_discrete(breaks=NULL) + xlab("Distribution of variable 1")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset