Visualization of spatial data is vital both during intermediate analysis steps (to examine preliminary results and make sure we are on the right track) and as the final product (to present our results to colleagues or in a publication). In this chapter, we are going to visualize various datasets we created in the previous chapters, bringing closure to the previously presented case studies. While doing this, you will learn how to produce an elaborate and customized graphical output in R.
Most of this chapter is going to concentrate on the popular graphical package ggplot2
. We will begin by presenting the logic behind the special syntax this package follows. Afterwards, we will review, through examples, the way spatial and nonspatial plots can be produced and customized using this package. The ggmap
package, which automates downloading static maps from the Web and can be used to complement plots produced with ggplot2
, will then be presented. Finally, we will experiment a little bit with three-dimensional (3D) visualization using the lattice
package.
In this chapter, we'll cover the following topics:
ggplot2
to produce publication-quality plotsggmap
to add static maps as the backgroundlattice
to produce 3D plotsIn this section, you are going to learn how to use ggplot2
and ggmap
to visualize spatial data. The section is structured as follows. First, we will review the ggplot2
framework using a simple example of a time series plot since (as you will see right away) the syntax is quite different from that of other plotting methods we used until now. After that, we will practice a little bit more with producing ordinary, nonspatial plots. Next, we will see how the ggplot2
plots can be saved for subsequent use, both within the R environment and in external files. Finally, the last two sections will deal with the most important material from this book's perspective. In these two sections, we will see how spatial data can be incorporated into ggplot2
visualizations to produce maps, and how we can conveniently download reference background images for such maps using ggmap
.
Before going into the details of ggplot2
, it is important to state that to use this package, our input data needs to be contained in a data.frame
object and it needs to be tidy (see Chapter 7, Combining Vector and Raster Datasets, for more information). Therefore, in practice, there are two steps when plotting with ggplot2
:
In some of the following examples, when the data is already in the right shape, we will skip the first step and move on to creating the plot right away. Otherwise, we will first have to reshape our data (using methods already familiar to us from the previous chapters) and only then utilize ggplot2
to create the plot itself.
The best way to understand the underlying logic of ggplot2
is through examples. Our first example is going to reproduce the time series plot we created in Chapter 2, Working with Vectors and Time Series, when demonstrating the difference between the three graphics systems in R. As you remember, we used the following data.frame
object (although not knowing that it was a data.frame
object at the time) that represents a time series of temperature measurements. The time
column of this data.frame
object contains dates, while the tmax
column contains the daily temperature maxima:
> head(dat) time tmax 1 2006-01-01 13.3 2 2006-01-02 14.4 3 2006-01-03 15.6 4 2006-01-04 14.4 5 2006-01-05 10.6 6 2006-01-06 12.8
We used the following expression to produce the simplest possible line plot of this time series with ggplot2
:
> library(ggplot2) > ggplot(dat, aes(x = time, y = tmax)) + + geom_line()
Although the following screenshot already appeared, in Chapter 2, Working with Vectors and Time Series, it is provided again here since we are going to discuss the example for some time now:
We will now briefly go over the main characteristics of the ggplot2
syntax, referring to the previous example. Later in this chapter, we will produce additional plots using other datasets from the previous chapters. This way, by the end of this chapter, we'll have reviewed the most important concepts and methods of operation for visualization of spatial data with ggplot2
.
The ggplot2
package is extremely flexible. Due the abundance of functions and usage options, the variety of plots it is capable of producing is mainly limited by the users' knowledge and expertise. However, this abundance can be overwhelming for beginners. In addition, the ggplot2
syntax is hard to grasp at first, both conceptually and practically. Nevertheless, those who overcome the initial difficulties are greatly rewarded.
The purpose of this chapter is to present, through examples, some of the most important points to note when using ggplot2
to display spatial data. Obviously, we cannot cover all plot types and their optional modifications (even a standalone book cannot accomplish such a task). What we can do is provide the initial knowledge sufficient for orientation. The subsequent usage of ggplot2
will inevitably require trial and error, as well as looking for help online. In addition, the following sources of information on ggplot2
are highly recommended:
ggplot2
package is fortunately accompanied by a highly comprehensive collection of help pages (http://docs.ggplot2.org/), where all functions and arguments are reviewed—in most cases with code and graphical output examples.ggplot2
(although there are relatively few examples involving spatial data).ggplot2
can refer to the absorbing book by the package author Wickham, H. ggplot2: Elegant Graphics for Data Analysis, Springer, 2009. Some of the code sections in that book are slightly outdated (the package has evolved since the book was published), but this does not affects the book's utility.Each expression used to create a plot with ggplot2
is made up of several components as we shall see shortly. It starts with the ggplot
function call (such as ggplot(dat,aes(x=time,y=tmax))
), followed by additional layers' definitions and settings.
At the core of each plot created with ggplot2
are the layers, with each layer necessarily associated with a given geometry and the data used to draw the layer. A plot with no layers cannot be produced since there will be nothing to show:
> ggplot(dat, aes(x = time, y = tmax)) Error: No layers in plot
Layers are added to a plot using the +
operator, and the data used to create the layer, as well as the aesthetic mapping (the link between the data and layers' aesthetic appearance), are specified as arguments either in the ggplot
function (setting them as global arguments for this particular plot) or in the respective layer function (setting them as local arguments for the particular layer only).
The most straightforward way to create a layer is to use the layer
function, where we can (rather verbosely) specify all of the layer characteristics. The preceding plot, for instance, can also be produced with the following expression, which demonstrates that geom_line
is a layer with a predefined "line"
geometry and "identity"
statistical transformation (this means that there is no statistical transformation; the values are taken as is), among other default definitions that remain hidden (such as colour="black"
for line color):
> ggplot(data = dat, aes(x = time, y = tmax)) + + layer(geom = "line", stat = "identity")
However, in practice, layers are most often specified using predefined layer functions rather than the layer
function. These functions' names start with geom
(such as geom_line
) or stat
(such as stat_contour
). There is no conceptual difference between these two types of functions; the difference is just in their defaults—the latter emphasizes statistical transformations while the former does not; therefore, each type of function may be easier to use for a given purpose. However, it should be remembered that the settings of any given layer can be overridden, so in many cases the geom
and stat
layers are redundant in terms of the desired result. For example, exactly the same layer of contours can be produced with either geom_contour
or stat_contour
.
In general, each layer is composed of the following five components:
geom_line
)data.frame
object where the data come from (for example, dat
)aes
function (for example, aes(x=time,y=tmax)
) or the definition of appearance unrelated to the data, that is, aesthetic setting outside the aes
functionA function to create a layer (such as geom_line
) already encompasses defaults for the geom
(geometry) and stat
(statistical transformation) parameters; it is rarely necessary to override these in practice. For our purposes in this chapter, we will also not have to deal with position adjustments; these are most useful in nonspatial plots such as histograms and boxplots. Therefore, there are just two parameters we will usually modify in the ggplot2
layers—the data and aesthetic mappings (and, optionally, aesthetic settings). For example, the previous plot of tmax
as a function of time
has only one layer and the geometry type of that layer is "line"
. The function used to create the layer is named geom_line
, accordingly. The data.frame
object from where the time
and tmax
values come is dat
, with time
mapped to x
(the x axis aesthetic) and tmax
mapped to y
(the y axis aesthetic).
The following table lists the specific layer functions we will use in this chapter in order to produce several common types of layers. The table also shows the set of required and optional parameters of each function. The required parameters (or aesthetics) of a given layer control its geometry and so, a layer cannot be drawn without them. For example, the geom_line
function requires a set of x
and y
coordinates since no line can be drawn without these. As we shall see, in practice the required parameters are always mapped to variables in our data (in the form of an assignment within an aes
function call). On the other hand, the optional parameters can either be mapped to variables in our data, set at constant values or left unspecified (in which case, they are set to their default constant values, such as colour="black"
for geom_line
, giving the line its default black color).
Function name |
Required parameters |
Optional parameters |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are, at the time of writing, 37 geom
and 21 stat
functions (visit http://docs.ggplot2.org/ for the complete list). In this chapter, we will limit ourselves to just these nine geom
functions, which are highly relevant with respect to spatial data.
Most of the names of the geometries are self-explanatory. For example, geom_line
is used to draw lines, geom_histogram
is used to draw histograms, and geom_raster
is used to draw rasters. The geom_contour
and geom_density2d
functions are used to create contours based on raster values or point density, respectively, as we shall see later in this chapter. The difference between geom_line
and geom_path
requires clarification. While both functions are used to create a line layer, the series of (x,y) points is connected according to their order along the x axis in geom_line
. On the other hand, in geom_path
, the points are connected according to their original order in the source data.frame
object. Therefore, the first is useful when plotting time series or mathematical functions (such as the temperature time series), while the second is useful to plot spatial line layers (such as a GPS track).
In addition to layer functions and their settings, which we briefly reviewed, three other types of components are used to control plot appearances in ggplot2
:
scale
, for example, scale_x_date
)facet
, for example, facet_grid
)theme
, for example, theme_bw
)How are all of these components specified in practice? The first component, as we have already seen, is always a ggplot
function call that initializes a ggplot
object used to store all the necessary information to produce a plot. It usually takes the ggplot(data,aes(...))
form, where data
is the default data.frame
object to be taken into account when plotting, and aes(...)
is the default aesthetic mapping to be used. These are passed to all of the other layers, unless the respective layer definition specifies its own data and/or aesthetic mapping, in which case they override those within ggplot()
. Next, the layers, scales, faceting definitions, and themes are added with each two components separated by a +
symbol. In our last example, the ggplot(dat,aes(x=time,y=tmax))
part specified that the default dataset is dat
, and the default aesthetic mapping is to plot time
on the x axis and tmax
on the y axis. The geom_line()
part then added a line layer; no arguments were specified, therefore the layer used the default dataset and the default aesthetic mapping. Scales, faceting, and theme settings were also left unspecified.
It is important to understand that in order to produce a minimal plot, we need to specify only a dataset, a single layer, and mappings for the layers' required parameters (such as x
and y
in geom_line
). Other layer parameters (such as colour
or size
), scales, faceting, and themes are optional. For example, the following expression adds a colour
setting and two scale definitions, but it produces exactly the same plot as shown in the preceding screenshot, since all of the components we added were already specified by default:
> ggplot(dat, aes(x = time, y = tmax)) + + geom_line(colour = "black") + + scale_x_date() + + scale_y_continuous()
The two scale settings used in the preceding expression (scale_x_date()
and scale_y_continuous()
) had no effect since a continuous variable (such as tmax
) is by default plotted on a continuous scale, while a Date
variable (such as time
) is by default plotted on a dates scale. In addition, we have an aesthetic setting (colour="black"
) that had no effect either. As opposed to an aesthetic mapping (which is always encompassed in the aes
function), the aesthetic setting links the layer to a constant aesthetic value, irrespective of the data in a given column. In the preceding expression, we set the line color to black in the geom_line
layer, with colour="black"
. Again, since "black"
is the default value for the line color in ggplot2
, this setting does not affect the plot appearance.
The plot we produced still needs a little polishing if we would like to include it in a publication. In general, certain plot adjustments are not related to the data (for example, changing the background color), while others are indirectly related to the data (for example, changing the color scale of a raster). First of all, we would like to have proper axis titles (for example, "Time" instead of "time" for the x axis), preferably without changing the data column names themselves. In many cases a journal may require a cleaner appearance, so we may also wish to remove the gray background and grid lines. The latter two properties can be modified using the scales and themes specifications, respectively.
Through the theme
functions, we can control the general appearance of the plots' nondata elements, such as plot title, axis labels style, tick marks length, background color, and so on. This can be done in two ways, which are not mutually exclusive but additive:
ggplot2
currently include theme_grey
, theme_bw
, theme_linedraw
, theme_light
, theme_minimal
, and theme_classic
. You are welcome to replace theme_bw
with each of these in the following code section, to see what each theme looks like.theme
function, specifying the element names and the required arguments. For example, adding theme(panel.grid=element_blank()
)
eliminates grid lines. There are currently 50 theme elements (such as panel.grid
), which can be modified through the theme
function (see ?theme
for the complete list). Three important things to note about themes are:element_blank
, element_line
, element_rect
, or element_text
). For example, to set axis.title
, we will use element_text
, but to modify panel_grid
, we will use element_line
(unless we want to eliminate it altogether, in which case we will use element_blank
). Consult ?theme
to see which element function is appropriate in specific cases.element
function, we can supply a list of characteristics we want to modify for that particular element. For example, element_line(colour="red",linetype="dotted")
for panel.grid
will make the grid lines red and dotted. As always, all unspecified properties remain at their default values.panel.grid=element_blank()
, there is no point modifying the panel.grid.minor
and panel.grid.major
(minor and major grid lines, respectively) characteristics since the properties of both are overridden by panel.grid
.Functions to modify the scales start with scale
, followed by the aesthetic property and the scale type. For example, scale_x_date
sets the x axis position aesthetic property to the date type. Arguments within a scale
function control additional properties of the scale, other than scale type which is determined by the function's name. For example, the name
parameter determines the scale name that appears along the respective axis or legend. Each aesthetic we would like to set a scale for requires an individual scale function (for example, scale_x_date()
and scale_y_continuous()
).
As an example, in the following code we modify several theme and scale properties of the time series plot:
> ggplot(dat, aes(x = time, y = tmax)) + + geom_line() + + scale_x_date(name = "Time") + + scale_y_continuous( + name = + expression(paste("Maximum temperature (", degree, "C)"))) + + theme_bw() + + theme(panel.grid = element_blank())
As you may have noticed, the y axis label is specified in a special way in the preceding code with expression(paste("Maximum temperature (", degree, "C)"))
. The reason for doing this is to introduce the degree (º) symbol in the y axis title (see the following screenshot). The expression
function returns an object of class expression
, which can be used to print mathematical symbols and annotations in R. It is combined with paste
in order to include both regular characters and mathematical symbols. In this case, the text consists of the "Maximum temperature ("
and "C)"
parts, while the mathematical part is degree
, which stands for the º symbol. The complete list of possible annotations and their syntax is available on the ?plotmath
help page.
The following screenshot shows what the new version of our plot looks like:
We can see that the plot is now black and white (thanks to theme_bw
) with no grid lines (thanks to theme(panel.grid=element_blank())
) and it has appropriate axis labels (thanks to the scale
specifications).
At times, a touch of interactive fine-tuning is required to introduce nonstandard elements or formatting into the graphical output from R. For instance, we may wish to put an arrow mark (such as ) with annotation (such as this was an especially cold day) on the preceding plot. Although it is possible to accomplish almost everything through the ggplot2
syntax (including the latter example), in some situations it may be advantageous to make final adjustments by hand using a graphical editor such as Inkscape (freely available from http://www.inkscape.org/) or Adobe Illustrator. For this purpose, it is best to export the plot with ggsave
to a vector graphics file (such as PDF or SVG), and then work on it in a graphical editor.
Since our main focus is on plotting spatial data, in this section, we will go over just two examples of ordinary (nonspatial) plots. Most of the subsequent examples in this chapter will deal with producing spatial plots, or in other words, maps.
We will begin with a line plot, but a more elaborate one than in the previous example. As promised in Chapter 7, Combining Vector and Raster Datasets, we are going to compare, visually, the NDVI trajectory in two forests (Lahav and Kramim) according to two data sources (the Landsat and MODIS satellites). To do this, we will display the four relevant time series in a single plot. We will mark the data for each forest with a different color in order to distinguish them. In addition, we will use different geom
functions for the data from each of the two satellites—the data from MODIS (where we have 280 data points for each forest) is best displayed with geom_line
, while the data from Landsat (where we have only three data points for each forest) will be displayed with geom_point
.
In terms of the ggplot2
syntax, the data from our input table forests_ndvi
will be divided into two parts:
forests_ndvi[forests_ndvi$sat == "MODIS", ]
forests_ndvi[forests_ndvi$sat == "Landsat", ]
The first subset of forests_ndvi
will be passed as the data
argument to geom_line
, while the second will be passed as the data
argument to geom_point
. As for the aesthetic mapping (the assignment expressions within the aes
function), x
is going to be mapped to date
and y
to ndvi
, in both layers (NDVI is plotted as the function of time). In addition, the colour
(in geom_line
) and fill
(in geom_point
) are mapped to forest
since we want to display the time series for each forest with a different color. One instance of the aesthetic setting is used in the point layer (shape=21
), defining the points shape as 21
(which corresponds to a filled circle; see ?points
). In addition to these two layers, six ggplot
components are supplied to specify the various scale settings (just their names, in this case) and theme settings (no grid lines and "top"
for the legend position). Note that the scales for fill
and colour
are discrete (scale_fill_discrete
and scale_colour_discrete
), which is appropriate for categorical variables such as the forest name.
The assigned colors are generated automatically. There is also a way to explicitly specify the color each group will take, using a manual scale. We will not go into that here; interested readers can follow the examples on ?scale_colour_manual
to see how individual colors of choice can be specified.
The entire expression to produce the plot thus takes the following form:
> ggplot() + + geom_line( + data = forests_ndvi[forests_ndvi$sat == "MODIS", ], + aes(x = date, y = ndvi, colour = forest)) + + geom_point( + data = forests_ndvi[forests_ndvi$sat == "Landsat", ], + aes(x = date, y = ndvi, fill = forest), shape = 21) + + scale_x_date("Time") + + scale_y_continuous("NDVI") + + scale_fill_discrete("Forest") + + scale_colour_discrete("Forest") + + theme_bw() + + theme(panel.grid = element_blank(), + legend.position = "top")
The time series plot, as shown in the following screenshot, is produced as a result:
Notably, the same legend has been generated for the points fill
aesthetic and the lines colour
aesthetic (and so, the legend symbols are composed of both line and point geometries). The fact that the same color scale is generated by ggplot2
for both aesthetics makes things easier.
As for the NDVI pattern the plot portrays, we can clearly see that the Kramim forest has consistently lower NDVI than Lahav and that the data from both satellites is in agreement on this. The seasonal NDVI pattern, which we have already witnessed on several occasions in previous chapters, is also apparent in the MODIS data.
In the second nonspatial example, we will plot the distribution of topographic slopes in built versus natural areas in Haifa, based on the buildings_mask
and natural_mask
rasters we created in Chapter 7, Combining Vector and Raster Datasets. This time, the data come in the form of rasters. Since ggplot2
requires data.frame
objects, the first thing to do is bring the data into one. We will transfer the (non-NA
) values of each raster to a table with two columns: one containing the raster values and the other identifying the cover type ("Buildings"
or "Natural"
):
> build = data.frame(cover = "Buildings", + slope = buildings_mask[!is.na(buildings_mask)]) > nat = data.frame(cover = "Natural", + slope = natural_mask[!is.na(natural_mask)])
Then, we will bind both tables into a single one with rbind
:
> slopes = rbind(nat, build) > head(slopes) cover slope 1 Natural 0.3740864 2 Natural 0.3918563 3 Natural 0.4300925 4 Natural 0.4843213 5 Natural 0.5266151 6 Natural 0.3173897
The data is ready to be plotted. This time, what we are interested in is visually comparing distributions. The most obvious way of doing this is to plot histograms of the two variables side by side. In ggplot2
, a histogram layer can be created with geom_histogram
, specifying only a single aesthetic—x
. In order to produce two histograms in a single plot, we will also use faceting.
Faceting, as previously mentioned, is the generation of numerous plots of the same type for different subsets of the data, within a single graphical output. For example, in the present context we would like to create two histograms side by side: one for the "Buildings"
cover type and another for the "Natural"
cover type. The reason for using faceting—in addition to saving the trouble of running the same code several times—is having a similar appearance and common axes in all subplots and thereby, making the comparison easier. There are two functions to produce facets in ggplot2
; the difference between them is in the way the subplots are geometrically arranged:
facet_wrap
: Used to create a continuous ribbon of panels, while (optionally) specifying the number of rows or columns (with nrow
and ncol
). This is most useful to create numerous subplots of the same type.facet_grid
: Used to create a two-dimensional grid of panels with the rows and columns defining different levels of a given variable. This is most useful when subplots are defined by combinations of two variables.The most important parameter of both facet_wrap
and facet_grid
is the formula defining the grouping factor(s) to create the facets. With facet_wrap
, we can specify only a single factor (for example, facet_wrap(~group)
) since the facets form a one-dimensional ribbon. With facet_grid
, we can specify two factors: one for the rows and another one for the columns (for example, facet_grid(group_row~group_column)
). If we wish to create facets with facet_grid
according to a single factor, we can replace one of the variables with a dot (.
). For example, facet_grid(group_row~.)
will result in a vertical ribbon of facets (since there is no factor for the columns), while facet_grid(.~group_column)
will result in a horizontal ribbon of facets (since there are no factors for the rows).
To clarify things, it is best to show an example. The following code produces two histogram facets, according to the grouping variable cover
, using facet_grid
. Since we have only one grouping factor, we mark the column grouping as .
to obtain a vertical ribbon (later in this chapter, in the Spain temperature maps example, we will use facet_grid
with two grouping factors):
> ggplot(slopes, aes(x = slope)) + + geom_histogram() + + facet_grid(cover ~ .) + + scale_x_continuous("Slope (radians)") + + scale_y_continuous("Count") + + theme_bw()
The resulting plot is shown on the left in the following screenshot. Since in this case we have a single grouping variable, the same kind of plot can also be produced with facet_wrap
. Replacing facet_grid(cover~.)
with facet_wrap(~cover,ncol=1)
produces an identical plot, except for a slightly different facet labeling scheme; the latter plot is shown on the right in the following screenshot. Note that in both cases, the facets share a common x axis.
Looking at these histograms, we can see that most buildings are located on relatively flat terrain while most natural areas occupy steeper slopes, which makes sense.
Quantitatively comparing the properties of slopes' distributions is straightforward using tapply
and the slopes
table. For example, tapply(slopes$slope,slopes$cover,mean)
will show that the mean slopes for the "Natural"
and "Buildings"
cover types are 0.23
and 0.15
, respectively. Substituting mean
with other functions (such as min
, max
, and sd
) will yield other properties.
In addition to the usual method of saving a graphical output in a file (see Chapter 2, Working with Vectors and Time Series), there is a specialized function called ggsave
for saving ggplot2
plots. For example, the first image in this chapter was incorporated into this book from a PNG file, obtained with ggsave
as follows:
> ggplot(dat, aes(x = time, y = tmax)) + + geom_line() > ggsave("C:\Data\4367OS_09_01.png", width = 5.5, height = 3.5)
The ggsave
function, by default, saves the last ggplot
object that has been plotted in the graphical window. Therefore, the only mandatory parameter is the file path. The file extension provided in the path determines the file format (a PNG image in this case; see ?ggsave
for a list of possible formats). The figure dimensions are taken as those of the currently active graphical window, unless explicitly specified through the width
and height
parameters. For instance, in this case, the width
is equal to 5.5
inches (inches are the default unit, cm and mm are also possible) since this is a commonly used text width in the letter page format.
A ggplot
object can also be assigned and kept in memory:
> tmax_line = ggplot(dat, aes(x = time, y = tmax)) + + geom_line()
It is already apparent, from the last few examples, that the print
method applied to such an object induces drawing it in a graphical window. Assignment of a complete ggplot2
plot, or individual layers (which we will see later), may serve at least two useful purposes:
ggsave
providing the plot object as an argument so that we do not have to wait for the plot to be drawn in the graphical window for no good reason. This way, we also do not need to keep in mind which plot was produced last. For example, after the tmax_line
object has been created, we can save the plot it describes as follows:> ggsave(plot = tmax_line, + filename = "C:\Data\4367OS_09_01.png", + width = 5.5, + height = 3.5)
theme_bw
, also serve this purpose. For example, the following expression produces the same plot we saw in the first image in this chapter:> tmax1 = ggplot(dat, aes(x = time, y = tmax)) > tmax2 = geom_line() > tmax1 + tmax2
So far in this chapter, we introduced the ggplot2
package and briefly reviewed its syntax using a few simple, nonspatial examples. In the next two sections, we will move on to slightly more complicated examples, now involving spatial data. What you will see right away is that producing a map with ggplot2
is conceptually no different from producing any other kind of plot (because spatial data, in turn, is not conceptually different from any other type of data).
A map is, in fact, simply a two-dimensional plot with points, lines, polygons and/or rasters, with plot space corresponding to geographical space through a given CRS. Since plot space is tied to geographical space, a map has a specific aspect ratio between the x and y axes (1:1). In other words, we cannot stretch a map to be wider or narrower since this would distort the correct proportion between distances in the x and y directions (unlike the tmax
time series plot we saw earlier in this chapter, for example, which can be stretched any way we like and still remain meaningful). As you may have noticed, the plot
method, applied on spatial vector layers or rasters in previous chapters, also produces plots with a constant 1:1 aspect ratio. The difference is that in ggplot2
, the input data comes as a data.frame
object so the function has no way of knowing that the data is spatial. Therefore, we need to manually specify that the aspect ratio should be constant using the coord_equal
function.
To summarize these considerations, a map produced with ggplot2
is just like any other plot produced with this package, except for the following two characteristics:
coord_equal
)It should be noted that ggplot2
(along with ggmap
, which will be introduced in the next section) is not the best choice for all spatial data visualization tasks. Maps that require, for example, substantial annotation (such as labels, scale bars, and north arrows) or special display-optimization algorithms (such as reduced overlap between street name labels and other map elements) are easier to handle in traditional GIS software. The advantage of ggplot2
, with respect to map making, becomes decisive in fairly analogous situations to those when R itself is advantageous:
Now that the framework to use ggplot2
with spatial data is defined in general terms, let's proceed to the practical application. So far, we witnessed that spatial vector layers and rasters are best represented in R using special classes such as SpatialPolygonsDataFrame
or RasterLayer
. How can we convert these data structures to data.frame
objects to be passed to ggplot
? Point and raster layers are readily convertible to a data.frame
object using the as.data.frame
function (for example, we converted the spainT
raster to a data.frame
object in the previous chapter). As already mentioned, however, lines and polygons are more complex than points and rasters due to the fact that each line or polygon geometry is composed of a variable number of points. The order in which the points are connected is also an integral part of the data, responsible for the correct geometry drawing. Therefore, a data.frame
object representing a line or polygon layer must have x and y coordinate columns (as do data.frame
objects for points and rasters), but also a grouping column (to denote which set of points forms a given line or polygon) and an ordering column (to specify the order by which the points are connected when drawing the given line or polygon). Conveniently, a function called fortify
is already defined in ggplot2
to convert line and polygon layers into data.frame
objects while preserving these properties.
For example, at the end of Chapter 5, Working with Points, Lines, and Polygons, we created a SpatialPolygonsDataFrame
object named county
that contains the borders of US counties along with average population densities for each county in the attribute table (the density
column). In order to bring the data in county
to a data.frame
form, readily available to be used by ggplot
, we need to apply the fortify
function. The "regions"
parameter of fortify
is used to specify which attribute table column corresponds to individual features. Since the FIPS
column identifies unique counties in the attribute table, we can use it as the region
:
> county_f = fortify(county, region = "FIPS") > head(county_f) long lat order hole piece group id 1 1225972 -1274991 1 FALSE 1 01001.1 01001 2 1234371 -1274114 2 FALSE 1 01001.1 01001 3 1244907 -1272280 3 FALSE 1 01001.1 01001 4 1244132 -1267496 4 FALSE 1 01001.1 01001 5 1265116 -1263940 5 FALSE 1 01001.1 01001 6 1265318 -1263907 6 FALSE 1 01001.1 01001
This is a typical output of fortify
. The table contains processed spatial information from the county
layer plus the id
column corresponding to the region
variable (in this case, FIPS
). The important columns, for our purposes, are as follows:
long
and lat
: The spatial coordinates. Note that these are the default x and y coordinate column names, respectively, created by fortify
, even when the layer is not in a geographical CRS.group
: Individual geometries identifier.id
: Individual features identifier.Since all other attribute data columns are removed by fortify
, we need to append the density
column manually by joining the attribute table of county
back to county_f
. However, before that, we need to change the name of the id
column to "FIPS"
, to match the FIPS
column in the attribute table:
> colnames(county_f)[which(colnames(county_f) == "id")] = "FIPS" > county_f = join(county_f, county@data, "FIPS") > head(county_f) long lat order hole piece group FIPS NAME_1 NAME_2 1 1225972 -1274991 1 FALSE 1 01001.1 01001 Alabama Autauga 2 1234371 -1274114 2 FALSE 1 01001.1 01001 Alabama Autauga 3 1244907 -1272280 3 FALSE 1 01001.1 01001 Alabama Autauga 4 1244132 -1267496 4 FALSE 1 01001.1 01001 Alabama Autauga 5 1265116 -1263940 5 FALSE 1 01001.1 01001 Alabama Autauga 6 1265318 -1263907 6 FALSE 1 01001.1 01001 Alabama Autauga TYPE_2 area census2010pop density 1 County 1562.805 54571 34.91863 2 County 1562.805 54571 34.91863 3 County 1562.805 54571 34.91863 4 County 1562.805 54571 34.91863 5 County 1562.805 54571 34.91863 6 County 1562.805 54571 34.91863
Using this table, we are ready to create a map showing population densities per county. However, it would be nice to add state borders as well to aid in map apprehension. To obtain states borders layer, we will download it from the GADM database (see Chapter 5, Working with Points, Lines, and Polygons). We will then use fortify
on it as well:
> states = getData("GADM", country = "USA", level = 1) > states = states[!(states$NAME_1 %in% c("Alaska", "Hawaii")), ] > states = spTransform(states, CRS(proj4string(county))) > states_f = fortify(states, region = "NAME_1") > head(states_f) long lat order hole piece group id 1 1076104 -1034268 1 FALSE 1 Alabama.1 Alabama 2 1085410 -1033146 2 FALSE 1 Alabama.1 Alabama 3 1093749 -1031892 3 FALSE 1 Alabama.1 Alabama 4 1107308 -1030032 4 FALSE 1 Alabama.1 Alabama 5 1108666 -1029851 5 FALSE 1 Alabama.1 Alabama 6 1112841 -1029288 6 FALSE 1 Alabama.1 Alabama
To plot a spatial layer created with fortify
(either lines, as we shall see later, or polygons), we need to map the x
and y
aesthetics to the long
and lat
columns, respectively. In addition, we need to map the group
column to the group
aesthetic so that each geometry will be drawn separately. For example, the California feature is composed of several geometries (the mainland plus several islands in the Pacific Ocean), and ggplot
needs to know which points form separate polygons according to the group
column ("California.1"
, "California.2"
, and so on). If we want to give certain aesthetics to each state, we can do that using the id
column (where all California polygons share the "California"
label).
The group
aesthetic can also be implemented in nonspatial layers and in layers other than geom_polygon
. It is used to identify individual geometries in order to draw them separately, when no other aesthetic (such as colour
or fill
) does the job. For example, in the following screenshot, we do not want each geometry (such as each island of California) to have a unique appearance, but we still want these geometries to be drawn separately rather than with a line going through all of them one by one. Try removing the group=group
part from the following expression and you will see what a mess the result is. You can also try group=id
to see that it messes up only those states that are composed of more than one geometry, where grouping by the group
column is indeed essential.
Another helpful feature we will experiment with in the present example is to save a collection of the ggplot2
components in order to integrate them in several plots and not have to type the expressions each time. We will create an object named sp_minimal
, defining our own custom theme (based on theme_bw
, with the axis name, labels, and tick marks removed):
> sp_minimal = + theme_bw() + + theme(axis.text = element_blank(), + axis.title = element_blank(), + axis.ticks = element_blank())
Before we plot county densities, we will start with a simpler example—drawing just the states_f
layer—to see how spatial polygon layers are drawn in ggplot2
:
> ggplot() + + geom_polygon(data = states_f, + aes(x = long, y = lat, group = group), + colour = "black", fill = NA) + + coord_equal() + + sp_minimal
A map of the states is produced as follows:
As you can see in the previous syntax, the polygons are drawn using geom_polygon
according to the long
, lat
, and group
data columns (mapped to the x
, y
, and group
aesthetics, respectively), with the colour
and fill
polygons set to constant values ("black"
for colour
and NA
for fill
, respectively; this means empty polygons with black borders). The other two plot elements are coord_equal
(which makes sure the 1:1 aspect ratio is maintained) and the sp_minimal
theme we just defined.
Now, we will make things slightly more complicated by plotting both the states_f
and county_f
layers together. Let's first view the code and output. Afterwards, we will discuss the different components involved. Here's the code that produces a county population density map:
> ggplot() + + geom_polygon(data = county_f, + colour = NA, + aes(x = long, y = lat, group = group, fill = density)) + + geom_polygon(data = states_f, + colour = "white", size = 0.25, fill = NA, + aes(x = long, y = lat, group = group)) + + scale_fill_gradientn( + name = expression(paste("Density (",km^-2,")")), + colours = rev(rainbow(7)), + trans = "log10", + labels = as.character, + breaks = 10^(-1:5)) + + coord_equal() + + sp_minimal
The following screenshot shows how the resulting plot looks:
Reading the preceding code reveals, first of all, that the plot is made by combining the following five components:
geom_polygon
(with county_f
)geom_polygon
(with states_f
)scale_fill_gradientn
coord_equal
sp_minimal
The two geom_polygon
layers, representing "states"
and county
layers, are drawn differently according to our needs in this visualization. The first thing to note about these layers is their order of appearance: the county_f
layer is added before states_f
. This is not a coincidence; the layers are drawn on the screen in the same order in which they are provided in the code. Since state and county borders obviously overlap, and we wish to draw the states borders on top of the counties borders (otherwise the states borders will not be visible), it is important to specify the layers in the right order.
The states_f
layer arguments are identical to those in the previous example; except that, we made the polygon borders white (with colour="white"
) and thin (with size=0.25
). As for the county_f
layer, the color of the borders (rather than fill
) was set to NA
, resulting in borderless polygons, and fill
was mapped to density
. The latter is indeed the defining feature of the map we made: mapping the county_f
polygons fill
aesthetic to density
is what gives different population densities different colors, and that was the whole purpose of making this plot.
We mapped fill
to density
, but which color gets assigned to which density level? As already mentioned, the scale
function determines the way values in the data are matched with the aesthetic appearance. So unless we are happy with the default scale (which is frequently not the case), we have to set the scale characteristics ourselves. For example, we already used scale_colour_discrete
in the forests NDVI time series example (applicable for discrete variables). In the present example, the scale_fill_gradientn
function, as the name suggests, was used to set the fill
scale type to gradientn. The latter is just one out of several possible fill
scale types (applicable to the colour
aesthetic as well; see http://docs.ggplot2.org/ for a complete list). For continuous variables, ggplot2
offers three types of color scales, which can be specified using the following functions:
scale_fill_gradient
: A two-color gradient, with the gradient end colors specified with high
and low
(see the Haifa buildings density example later in this chapter)scale_fill_gradient2
: A three-color gradient, with the low
, med
, and high
colors specified (see the London buildings distance to river example later in this chapter)scale_fill_gradientn
: A custom n-color gradient, specified with a colours
vectorThere are several ways to create a vector of colors in R to be supplied to the colours
parameter of scale_colour_gradientn
as well as to other graphical functions. The simplest way, which we used in the present example, is to employ one of the predefined color palette functions that can take a numeric value as an argument and produce a series of colors (returned as RGB codes) at equal intervals. These functions can be used for many purposes:
rainbow
heat.colors
terrain.colors
topo.colors
cm.colors
In the examples in this chapter, we employ rainbow
and terrain.colors
. You are welcomed to experiment with the other three palettes to see what they look like. The color codes these functions return are rarely useful in their own right. Instead, they are passed as color palettes to graphical functions such as scale_colour_gradientn
.
> rainbow(3) [1] "#FF0000FF" "#00FF00FF" "#0000FFFF"
In cases when you have a different palette in mind which is not covered by these functions, you can use the RColorBrewer
package (which provides many more palettes) or the colorRampPalette
function available in base R (which can be used to create custom-made color palettes; see the last example in this chapter).
A short explanation of the arguments of the scale_fill_gradientn
function we used is in order:
name=expression(paste("Density (",km^-2"",")"))
: The name
argument specifies the scale name and the way it will appear on the plot legend. We once again used the expression
function, this time to add -2
in superscript (see the preceding screenshot).colours=rainbow(7)
: Specifies the vector of colors to create a color palette with. Here, specifying seven colors gives a good result, with the whole variability the rainbow
palette provides represented. It may require trial and error to find out how many colors are sufficient in each case.trans="log10"
: The trans
parameter can be passed to any continuous scale function in ggplot2
. It is used to specify a mathematical transformation for the data. In this particular example, we use a logarithmic scale ("log10"
) to highlight the differences in the population density between different counties, which is necessary since the distribution of county densities is highly nonhomogeneous, with many low-density counties and few high-density counties. You can try deleting the trans
, labels
, and breaks
arguments, and running the previous code once more to see why the default linear scale is inappropriate in this case. There are many useful built-in types of transformations, as well as methods to define custom transformations but this is beyond the scope of this book.labels=as.character
and breaks=10^(-1:5)
: The breaks
argument specifies the breakpoints where tick marks and labels are generated, while the labels
argument specifies the labels themselves or (as in this case) a function to format the labels with. In this case, by default, R prints the 10^(-1:5)
labels vector in mathematical notation (1e-01
, 1e+00
, 1e+01
, and so on). This is evaded by converting the numeric values to characters (using as.character
) to receive more comprehensive labels (0.1
, 1
, 10
, and so on).The last two components in the code section that produces the preceding plot—coord_equal
and sp_minimal
—were already used in the previous plot, so we will not repeat their meaning here.
Finally, referring to the plot itself, we can see the concentration of densely populated counties in the eastern half of the USA and along the coast of the Pacific Ocean. Further inland in the Western USA, population density is generally low.
Moving on to the second type of spatial data structure, a raster, we are going to experiment with different variations of the Haifa DEM map in the following few examples. As the data source, we will take the dem
and hill
rasters (see Chapter 6, Modifying Rasters and Analyzing Raster Time Series). Data from these two rasters will be transferred into a single data.frame
object with four columns: cell coordinates (x
and y
), elevation value (elev
), and hillshade value (hill
). Note that in this case, we use the data.frame
function to do this, supplying the coordinates and the vector of values from each raster separately (relying on the fact that the two rasters are geometrically identical, since hill
was derived from dem
). An equally good approach would have been to stack
the two rasters and then convert the result to a data.frame
object with as.data.frame(...,xy=TRUE)
:
> dem_hill = data.frame(coordinates(dem), + elev = dem[], + hill = hill[]) > dem_hill = dem_hill[complete.cases(dem_hill), ] > head(dem_hill) x y elev hill 2421 698971.3 3648863 19 0.3533017 2422 699061.3 3648863 20 0.3408402 2423 699151.3 3648863 19 0.3402018 2424 699241.3 3648863 19 0.3451085 2425 699331.3 3648863 20 0.3439193 2426 699421.3 3648863 21 0.3368811
Now that we have the dem_hill
table, we can plot the data using a geom_raster
layer. The geom_raster
function requires the x
, y
, and fill
aesthetics to draw a raster, as the following code and images demonstrate. In addition, since we are going to make several versions of the Haifa DEM plot, it would be best to save this basic plot as a ggplot
object (hereby named haifa_relief
) so that we can later update it by incorporating additional layers:
> haifa_relief = ggplot(dem_hill, aes(x = x, y = y)) + + geom_raster(aes(fill = elev)) + + scale_fill_gradientn("Elevation (m)", + colours = terrain.colors(10)) + + coord_equal() + + theme_bw() + + theme(axis.title = element_blank(), + axis.text.y = element_text(angle = 90, hjust = 0.5)) > haifa_relief
The syntax being used is analogous to the previous example, except that we use geom_raster
instead of geom_polygon; x
and y
are mapped to spatial coordinates, while fill
is mapped to raster values. Note that geom_raster
takes aes(fill=elev)
as an argument, while the missing parts (data
, x
, and y
) are passed, by default, from the ggplot
set of arguments. As we shall see right away, this mode of operation is very convenient when we would like to add another layer (in this case, the hillshade
) with the same arguments so that we do not have to type them once again. The theme
components this time specify that the y axis labels should be rotated by 90º (which is often used in maps). The resulting plot is shown on the left panel in the following screenshot. Note that the color scale used (terrain.colors
) is particularly useful to display the topography.
Now that we have our basic relief image, we can experiment with various additional components that we may wish to show on top of it. For instance, the hillshade calculation gave us a layer of theoretical shading degree for the given topography and sun position (see Chapter 6, Modifying Rasters and Analyzing Raster Time Series). We can add the shading values as an all-black raster with various levels of transparency (making the shaded areas less transparent and thus darker). This will create an illusion of a three-dimensional image, which is also known as a shaded relief in cartography. To create shading, we need to add another raster layer with black pixels whose transparency aesthetic is mapped to the hill
values. Transparency, in ggplot2
as well as in other graphical functions in R, is determined by a parameter named alpha
, ranging from 0
to 1
, with 0
being completely transparent and 1
being completely opaque. Limiting the maximum of the alpha scale to 0.5
, using the range
parameter makes sure we do not get completely black pixels in heavily shaded areas:
> haifa_relief_shade = haifa_relief + + geom_raster(aes(alpha = hill), fill = "black") + + scale_alpha_continuous("Hillshade", range = c(0.5, 0)) > haifa_relief_shade
The resulting plot is shown on the right panel in the following screenshot. Viewing the two images side by side demonstrates the effect that shading has.
Talking about topography, another feature we may wish to add to this plot is contour lines. We already calculated contour lines from rasters both indirectly (displaying them using levelplot
, in Chapter 4, Working with Rasters, for instance) and directly (creating a contour lines layer with rasterToContour
, in Chapter 7, Combining Vector and Raster Datasets). Here, we are looking at yet another indirect way to calculate contour lines based on a raster using ggplot2
. To add a contour lines layer, we can use the geom_contour
function, mapping its z
aesthetic to elevation so that contour lines are calculated according to it:
> haifa_relief_shade + + geom_contour(aes(z = elev), colour = "black", alpha = 0.3)
The following screenshot shows the graphical output that is produced, showing semi-transparent contour lines on top of the shaded relief of Haifa:
In the previous example, we generated a contour plot based on the values of an existing raster (the Haifa DEM in this case). Another common type of plot showing contours is one where contours denote the average density of events in a point pattern. Creating density contours is a common way to reduce the amount of information and make point pattern visualization more comprehensible. For example, when we have so many points that they overlap each other, it may be difficult to discern denser and sparser locations. Drawing contours is one possible solution to this problem.
A commonly used procedure to create density contours out of a point pattern is two-dimensional kernel density estimation, available through the base R function named kde2d
. The geom_density2d
function is a ggplot2
adaptation of the latter function, creating contours out of a density surface calculated with kde2d
.
As an example of geom_density2d
, we will draw contours showing the average density of buildings in Haifa.
It should be noted that, in practice, density estimation is only meaningful for mapped point patterns, that is, point patterns where all events in the studied area have been detected. When detection is incomplete, density estimates will obviously be biased to an unknown degree in each location. In our case, for example, if the OpenStreetMap layer of Haifa buildings is incomplete, then a density estimate based on it will not be very useful as it will encompass underestimation to an unknown and variable degree.
Since density estimation requires a point pattern, we need to convert the buildings layer to a point layer by finding building centroids:
> haifa_buildings_ctr = gCentroid(haifa_buildings, byid = TRUE) > haifa_buildings_ctr = as.data.frame(haifa_buildings_ctr) > head(haifa_buildings_ctr) x y 1 684013.4 3629679 2 688023.0 3629752 3 687645.0 3627410 4 685913.9 3631217 5 683144.4 3633649 6 683515.3 3628980
Taking the haifa_relief_shade
plot, we will now add both the buildings point pattern (using geom_point
) and the density contours based on that pattern (using geom_density2d
). Using the limits
parameter of scale_x_continuous
and scale_y_continuous
, we will also limit our scope to a window centered at haifa_buildings_ctr
. In addition, the colour
aesthetic of the contours is mapped to ..level..
, which corresponds to the contour breaks. The level
variable is surrounded by ..
, which is a special way to signal to ggplot2
that we are referring to a variable generated by the layer itself, rather than a variable present in data.frame
:
> haifa_relief_shade + + geom_contour(aes(z = elev), colour = "black", alpha = 0.3) + + geom_point(data = haifa_buildings_ctr, size = 0.5) + + geom_density2d(data = haifa_buildings_ctr, + aes(colour = ..level..)) + + scale_colour_gradient("Density", low = "blue", high = "red") + + scale_x_continuous( + limits = c(min(haifa_buildings_ctr$x)-2000, + max(haifa_buildings_ctr$x)+2000)) + + scale_y_continuous( + limits = c(min(haifa_buildings_ctr$y)-2000, + max(haifa_buildings_ctr$y)+2000))
The following screenshot shows the graphical output:
In the preceding screenshot, we see three different legends as there are three different scales: elevation, hillshade, and buildings density. According to this map (again, assuming that all buildings have been mapped), we can see that the highest density of individual buildings per unit area is found on the eastern slopes of Mount Carmel.
Our final example in this section will be to plot the time series of interpolated standardized temperature maps, which we created in the previous chapter. We already took the trouble of reshaping the data into a tidy table; therefore, we do not need to take any preliminary processing steps. Using the spainT
table and the following code section, we recreate a plot similar to the rasterVis
version from the previous chapter with ggplot2
:
> ggplot(spainT, aes(x = x, y = y, fill = value)) + + geom_raster() + + scale_fill_gradient2( + expression(paste("Value (", degree, "C)")), + low = "blue", high = "red", limits = c(-4,4)) + + geom_contour( + aes(z = value), + colour = "black", size = 0.1, breaks = c(-4:-1,1:4)) + + coord_equal() + + facet_grid(variable ~ year) + + sp_minimal
The following screenshot shows the graphical output:
Note that we have once again utilized our sp_minimal
custom-made theme to eliminate the axis annotation. The geom_contour
function has been used to generate contour lines, similar to the Haifa DEM elevation contours, only that now we specified the exact vector of breaks, specifying where contour lines will be generated. Using facet_grid(variable~year)
, we arranged the facets in rows according to variable
and in columns according to year
.
Nowadays, cartographers and spatial analysts are lucky to have a variety of online resources for map creation. We already used freely available satellite imagery, DEMs, and administrative border datasets, for example. However, to recover and process such datasets is not always a quick and straightforward task. We may need to figure out the spatial location in question, search for available resources, import them into R, and so on. On the other hand, several applications, such as Google Maps and OpenStreetMap, provide readily available maps of the entire earth with a variety of features that can be unreasonable to collect ourselves each time we want to create a background for a quick visualization (satellite and aerial imagery, roads, borders, town names, and so on).
The ggmap
package is an extension to ggplot2
, providing a simple interface to download static maps from such online resources and easily incorporate them as the background when creating maps within the ggplot2
framework. We already used the ggmap
package for geocoding (see Chapter 5, Working with Points, Lines, and Polygons), and mentioned the introductory paper on this package by Kahle and Wickham (2013).
By static maps, we refer to maps that cannot be manipulated by scrolling, zooming in and zooming out, and so on, as opposed to dynamic maps that respond to such user feedback. The whole ggplot2
framework, in that sense, may be referred to as static plotting. It should be noted that dynamic plots and maps (and even entire web applications; see the shiny
package) can also be produced in R. As for dynamic maps, for instance, we have already seen a short example of using the Google Earth software as a platform for dynamic spatial data display from R using the plotKML
package (see Chapter 4, Working with Rasters). Another example is the plotGoogleMaps
package, which can be used to display spatial data from R on top of interactive Google Maps in a web browser.
The static maps incorporation with ggmap
is quite simple in practice. Using the get_map
function of the ggmap
package, we first need to download the required background image. Then, we can use that image as a background layer with the ggmap
function of the same package. Any additional layers and components can be added to the plot initiated with ggmap
the same way as to a plot initiated with ggplot
.
Our first example featuring ggmap
will be to display the City of London buildings and their distances to the River Thames, as promised in Chapter 5, Working with Points, Lines, and Polygons. Since layers obtained with ggmap
are always defined in a geographical CRS, we must first reproject the two relevant layers that we will plot to such a CRS as well. The layers we are going to plot, in this case, are as follows:
buildings
: City of London buildingscity
: City of London boundary polygonThe following expressions reproject these layers and save them as new objects buildings_geo
and city_geo
, respectively:
> buildings_geo = spTransform(buildings, + CRS("+proj=longlat +datum=WGS84")) > city_geo = spTransform(city, + CRS("+proj=longlat +datum=WGS84"))
To specify the location for which we would like to download a map with get_map
, we can use the coordinates of the buildings layer centroid, obtained in the following manner:
> buildings_ctr = coordinates(gCentroid(buildings_geo)) > buildings_ctr x y 1 -0.09222641 51.51499
Next, we will fortify
the buildings_geo
layer, to make the data available in the form of a data.frame
object. The region in this case is the osm_id
attribute table column, which holds unique building identifiers:
> buildings_f = fortify(buildings_geo, region = "osm_id") > head(buildings_f) long lat order hole piece group id 1 -0.09962729 51.51428 1 FALSE 1 100684524.1 100684524 2 -0.09952849 51.51429 2 FALSE 1 100684524.1 100684524 3 -0.09942449 51.51429 3 FALSE 1 100684524.1 100684524 4 -0.09941299 51.51424 4 FALSE 1 100684524.1 100684524 5 -0.09951839 51.51423 5 FALSE 1 100684524.1 100684524 6 -0.09961579 51.51422 6 FALSE 1 100684524.1 100684524
Similar to what we saw in the county
example, fortify
removes the attribute table; therefore, we need to attach it back manually:
> colnames(buildings_f)[which(colnames(buildings_f) == "id")] = + "osm_id" > buildings_f = join(buildings_f, buildings@data, "osm_id") > head(buildings_f) long lat order hole piece group osm_id 1 -0.09962729 51.51428 1 FALSE 1 100684524.1 100684524 2 -0.09952849 51.51429 2 FALSE 1 100684524.1 100684524 3 -0.09942449 51.51429 3 FALSE 1 100684524.1 100684524 4 -0.09941299 51.51424 4 FALSE 1 100684524.1 100684524 5 -0.09951839 51.51423 5 FALSE 1 100684524.1 100684524 6 -0.09961579 51.51422 6 FALSE 1 100684524.1 100684524 name type dist_river 1 Temple Bar <NA> 378.7606 2 Temple Bar <NA> 378.7606 3 Temple Bar <NA> 378.7606 4 Temple Bar <NA> 378.7606 5 Temple Bar <NA> 378.7606 6 Temple Bar <NA> 378.7606
As the output shows, each building is now, once again, associated with a dist_river
value.
Finally, we will fortify
the city_geo
polygon as well. Since there is no attribute data of interest along with it, we do not have to join anything to the resulting data.frame
object in this case:
> city_f = fortify(city_geo, region = "CTYUA13NM") > head(city_f) long lat order hole piece group 1 -0.09671385 51.52319 1 FALSE 1 City of London.1 2 -0.09669776 51.52316 2 FALSE 1 City of London.1 3 -0.09668468 51.52317 3 FALSE 1 City of London.1 4 -0.09662369 51.52318 4 FALSE 1 City of London.1 5 -0.09646984 51.52282 5 FALSE 1 City of London.1 6 -0.09601742 51.52295 6 FALSE 1 City of London.1 id 1 City of London 2 City of London 3 City of London 4 City of London 5 City of London 6 City of London
Examining the group
column in this particular case will reveal that it has the same value ("City of London.1"
) in all rows since the city
layer is composed of a single polygon.
The layers are ready. What is left to be done is download the background and produce the map. To download the background, we will employ the get_map
function. In order to efficiently work with this function, we need to provide arguments for just these three parameters:
location
: The center of the map to be downloaded, which can be a longitude/latitude pair.maptype
: The type of map to be downloaded. This is source-specific, but for Google Maps (the default data source), "terrain"
, "satellite"
, "roadmap"
, and "hybrid"
are applicable. See the following screenshot for an example of a "hybrid"
map; for a "satellite"
example, see the screenshot after the following screenshot.The default data source, and the one we use here, is Google Maps. Note that the Google Static Maps API is used by get_map
to download the images in such cases, and that by using this function, you are agreeing to the Google Maps API Terms of Service (https://developers.google.com/maps/terms). Other data sources can be selected by modifying the source
parameter of get_map
, with possible sources being Google Maps ("google"
), OpenStreetMap ("osm"
), Stamen Maps ("stamen"
), or CloudMade maps ("cloudmade"
). See ?get_map
for details.
zoom
: The zoom level; an integer between 3 and 21 (the default value is 10). The best zoom value for a given map is determined by trying higher or lower magnifications and examining the result.In our case, to get a static map of the City of London composed of a photographic map with roads and other features marked on top (a "hybrid"
map), we can use the following expression:
> city_map = get_map(location = buildings_ctr, + maptype = "hybrid", + zoom = 14)
Now, plotting this map on its own (to inspect the zoom
argument's effect, for instance) can be done by simply typing ggmap(city_map)
, that returns a ggplot
object by default, plotted in the graphical window—as we have already seen earlier. However, to add our supplementary layers based on the buildings_f
and city_f
objects, we need to incorporate them using two geom_polygon
function calls. A few minor settings are also introduced in this particular example, which are as follows:
muted
function of the scales
package is being used"Distance to river (m)"
to be split over two lines is achieved using the new line symbol
labs
function is used to easily modify axis titles, instead of modifying the scale itself with the scale
functionsThe code to produce the City of London buildings map appears as follows:
> library(scales) > ggmap(city_map) + + geom_polygon(data = buildings_f, + aes(x = long, y = lat, group = group, + fill = dist_river), + size = 0.1, colour = "black") + + geom_polygon(data = city_f, + aes(x = long, y = lat, group = group), + colour = "yellow", fill = NA) + + scale_fill_gradient2("Distance to river (m)", + low = muted("blue"), high = muted("red"), + midpoint = 500) + + labs(x = "Longitude", y = "Latitude")
The resulting map looks like the following screenshot:
The map shows an up-to-date "hybrid"
map of London downloaded from Google Maps, with the City of London boundary (in yellow) and buildings (in blue to red colors, according to their distance to the River Thames) on top of it.
To practice using ggmap
some more, we will create another map with the following components encompassing all three vector layer types—points, lines, and polygons—along with a static map downloaded from the Web:
"satellite"
static map backgroundtowns
: A point layer of two towns' locations (see Chapter 7, Combining Vector and Raster Datasets)track
: A line layer of a GPS track (see Chapter 5, Working with Points, Lines, and Polygons)forests
: A polygonal layer of two planted forests (see Chapter 7, Combining Vector and Raster Datasets)As in the previous example, we first need to bring all of these layers to a geographic CRS:
> towns_geo = spTransform(towns, + CRS("+proj=longlat +datum=WGS84")) > track_geo = spTransform(track, + CRS("+proj=longlat +datum=WGS84")) > forests_geo = spTransform(forests, + CRS("+proj=longlat +datum=WGS84"))
Next, we need to use fortify
on the line and polygonal layers (track_geo
and forests_geo
) in order to bring them to a data.frame
form. Point layers, as previously mentioned, have a much simpler structure than lines and polygons, so no fortify
method exists for them. Instead, we can always manually construct a data.frame
object to represent a point layer. In the present case, for example, towns_geo
is a SpatialPoints
object and towns_names
is a character vector. Using the coordinates
functions and data.frame
, we can combine them into a data.frame
object:
> towns_f = data.frame(coordinates(towns_geo), name = towns_names) > towns_f lon lat name 1 34.87131 31.37936 Lahav Kibbutz 2 34.81695 31.37383 Lehavim > track_f = fortify(track_geo) > head(track_f) long lat order piece group id 1 34.85472 31.36520 1 1 0.1 0 2 34.85464 31.36540 2 1 0.1 0 3 34.85458 31.36559 3 1 0.1 0 4 34.85454 31.36519 4 1 0.1 0 5 34.85443 31.36639 5 1 0.1 0 6 34.85445 31.36733 6 1 0.1 0 > forests_f = fortify(forests_geo, region = "name") > head(forests_f) long lat order hole piece group id 1 34.87591 31.33830 1 FALSE 1 Kramim.1 Kramim 2 34.87559 31.33831 2 FALSE 1 Kramim.1 Kramim 3 34.87560 31.33858 3 FALSE 1 Kramim.1 Kramim 4 34.87528 31.33858 4 FALSE 1 Kramim.1 Kramim 5 34.87529 31.33885 5 FALSE 1 Kramim.1 Kramim 6 34.87529 31.33912 6 FALSE 1 Kramim.1 Kramim
The three data.frame
objects (towns_f
, track_f
, and forests_f
) are now in memory. The missing component is just the background static map, which can be downloaded with get_map
as follows:
> forests_ctr = coordinates(gCentroid(forests_geo)) > forests_map = get_map(location = forests_ctr, + maptype = "satellite", + zoom = 12)
The following code is used to combine all of these components into a single plot. Note that track_f
is introduced through a geom_path
layer (which is the appropriate geometry for spatial lines) and forests_f
is introduced using geom_polygon
. The point layer towns_f
is added as a geom_text
layer, rather than geom_point
, in order to display the towns' locations as name labels instead of points. The label
aesthetic of geom_text
controls the text that each point is associated with, which in our case is the name
column that holds town names.
> ggmap(forests_map) + + geom_polygon(data = forests_f, + aes(x = long, y = lat, group = group, colour = id), + fill = NA) + + geom_path(data = track_f, + aes(x = long, y = lat), + colour = "yellow") + + geom_text(data = towns_f, + aes(x = lon, y = lat, label = name), + colour = "white", size = 2.5, fontface = "bold") + + scale_colour_discrete("Forest") + + labs(x = "Longitude", y = "Latitude")
The resulting map is shown in the following screenshot:
This map shows the towns' locations (as white text labels), the GPS track (as a yellow line), and the forest borders (as polygons, with border colors according to the forest identity).