In this chapter, we will cover the basic usage of the second major type of spatial data—vector layers (points, lines, and polygons). In GIS terminology, these data are sometimes referred to as vectors, but we will use the term vector layers to distinguish them from vectors in R (see Chapter 2, Working with Vectors and Time Series). We will review the architecture of the vector layer classes defined in the sp
package. Examples of the most common operations involving vector layers will then be presented using the sp
, rgdal
, and rgeos
packages.
In this chapter, we'll cover the following topics:
sp
packageSpatial vector layers have two components: the geometry and the attribute table. The geometry component holds the spatial coordinates and information regarding their arrangement in separate features, while the attribute table holds additional information regarding each feature. For example, in a point layer of capital cities, the record for London may be composed of a geometric component (a point coordinate, such as 51.5072°N, 0.1275°W) and a row in an attribute table holding additional data regarding each city (for example, population size, built area, and so on).
The geometry part in a vector layer is obligatory and there are three types of geometries: points, lines, and polygons. The attribute table is optional. Classes for the six spatial vector layers types, constituting all possible combinations of these two properties, have been defined in the sp
package. They are summarized in the following table:
Geometry type |
Attribute table |
Class |
---|---|---|
Points |
No |
|
Yes |
| |
Lines |
No |
|
Yes |
| |
Polygons |
No |
|
Yes |
|
Together with the three raster classes from the raster
package (presented in the previous chapter), these constitute a commonly used set of classes suitable for a wide range of spatial analysis applications in R.
Additional classes for spatial vector layers (and rasters) do exist in R. They are often associated with packages intended for specialized data analysis tasks. Conveniently, however, there are usually established methods of converting objects to and from such classes, with respect to the highly popular classes in the sp
and raster
packages (which we've used throughout this book). For example, the ppp
class (the spatstat
package) is used to represent a point pattern (event locations and the window where the pattern has been observed) to employ statistical analysis tools for such patterns. A SpatialPoints
object can be converted to a ppp
object and vice versa using the maptools
package.
In this section, we will see an example of how to create a vector layer for each of the three types of geometries—points, lines, and polygons. In the next section, methods to examine spatial vector layers will be presented, while in the last three sections of this chapter, we will see how some more advanced operations on vector layers are performed. Additional information regarding the sp
package and its spatial vector classes can be found in the book Applied Spatial Data Analysis with R, Springer, whose second edition was published in 2013, by Roger Bivand, Edzer Pebesma, and Virgilio Gómez-Rubio.
Points are the simplest type of spatial objects since the geometrical component of a point is just a single (x,y) coordinate. A set of (x,y) coordinates, along with CRS information, constitutes a spatial point layer, which can be represented in R with a SpatialPoints
object. If we also have an attribute table where each row corresponds to a single point, the layer can be represented with a SpatialPointsDataFrame
object.
In our first example, we are going to create a SpatialPointsDataFrame
object by geocoding, the procedure to convert addresses to geographic coordinates. The intermediate steps we will go through are as follows:
data.frame
object of geographic coordinates for each address.data.frame
object to a SpatialPointsDataFrame
object.In the example, we will geocode the addresses of three airports in New Mexico:
For the geocoding step (step 2), we are going to use the Google Maps API, accessible from R through the ggmap
package. Additional functionality of the ggmap
package for visualization of spatial data will be presented in Chapter 9, Advanced Visualization of Spatial Data. For an overview of package capabilities, refer to the introductory paper on ggmap
(ggmap: Spatial Visualization with ggplot2) by Kahle, D. and Wickham, H. 2013.
Our first step is to create a character vector with the airports addresses. The addresses are ordered in accordance with the airport names listed earlier:
> addresses = c( + "2200 Sunport Blvd, Albuquerque, NM 87106, USA", + "7401 Paseo Del Volcan Northwest Albuquerque, NM 87121, USA", + "121 Aviation Dr, Santa Fe, NM 87507, USA")
Note that the Google Maps API (which we will use for geocoding) can also search for location names (such as the White House) or partial addresses, but the result will be less determinate (two different places can be go by the same name, but not the same address). The exact addresses are used here to ensure that the reader will obtain the same results as in the book's text.
We are ready for the second step and will now load the ggmap
package and geocode the addresses using the geocode
function. This function accepts a vector of addresses (such as addresses
that we just created) and returns a data.frame
object of the matched geographic coordinates:
> library(ggmap) > airports = geocode(addresses) > airports lon lat 1 -106.6168 35.04918 2 -106.7947 35.15559 3 -106.0731 35.62866
The geocode
function accesses the Google Maps API. Therefore, an Internet connection is required. Using the Google Maps API also implies agreement with the Google Maps API Terms of Service (https://developers.google.com/maps/terms) and is limited to 2,500 queries in a 24-hour time period (using the free service option). A promising alternative can be found in the geocodeHERE
package providing access to the HERE Geocoding API (https://developer.here.com/geocoder) that has a higher threshold of 10,000 queries per day.
We now have a data.frame
object named airports
that holds the longitudes (lon
) and latitudes (lat
) of the three airports.
In our third step, we will add the airport names in an additional column called name
in airports
as follows:
> airports$name = c("Albuquerque International", + "Double Eagle II", + "Santa Fe Municipal") > airports lon lat name 1 -106.6168 35.04918 Albuquerque International 2 -106.7947 35.15559 Double Eagle II 3 -106.0731 35.62866 Santa Fe Municipal
The resulting data.frame
object has everything we need to create a SpatialPointsDataFrame
object, where the geometry components will be the airports' coordinates, and the attribute table will have a single column that holds the airport names. This is our fourth step. It is performed using the coordinates
function of the sp
package by specifying the columns that hold the coordinates with a formula
object of the structure ~x_coord+y_coord
. In our case, the x coordinate is the longitude (lon
) and the y coordinate is the latitude (lat
), therefore the formula takes the following form:
> library(sp) > coordinates(airports) = ~ lon + lat
The last expression converts airports
from a data.frame
object to a SpatialPointsDataFrame
object. We can confirm the conversion took place using the class
function as follows:
> class(airports) [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp"
Using the print
method on the object shows that it holds exactly the same information as before; just that the lon
and lat
columns are now identified as spatial coordinates, as shown in the following output:
> airports coordinates name 1 (-106.6168, 35.04918) Albuquerque International 2 (-106.7947, 35.15559) Double Eagle II 3 (-106.0731, 35.62866) Santa Fe Municipal
Our last step is to specify the CRS of airports, which is a geographic CRS:
> proj4string(airports) = CRS("+proj=longlat +datum=WGS84")
The airports
vector layer is now complete.
Writing of Spatial*DataFrame
objects (that is, the SpatialPointsDataFrame
, SpatialLinesDataFrame
, and SpatialPolygonsDataFrame
objects) can be done with the writeOGR
function from the rgdal
package. In the following example, as well as in most other examples in this book, we will use the popular ESRI Shapefile format for the input and output of vector layers.
We can export our SpatialPointsDataFrame
object airports
to a Shapefile on the disk as follows:
> library(rgdal) > writeOGR(airports, "C:\Data", "airports", "ESRI Shapefile")
The four arguments that were supplied to the first four parameters of the writeOGR
function are as follows:
obj
: The name of the object to be written (for example, airports
)dsn
: The path of the directory where the file(s) will be written (for example, "C:\Data"
)layer
: The layer name (for example, "airports"
)driver
: The driver name (for example, "ESRI Shapefile"
)Since Shapefile datasets are composed of several separate files (at least three: *.shp
, *.shx
, and *.dbf
), when reading and writing with the ESRI Shapefile driver, we specify the directory (dsn
) and filename without the extension (layer
). In the previous example, with dsn="C:\Data"
and layer="airports"
, four files were written to the C:Data
directory: airports.dbf
, airports.prj
, airports.shp
, and airports.shx
. The meaning of the dsn
and layer
parameters can be different with other drivers (see the next section and ?writeOGR
).
To read the file we just wrote back into a SpatialPointsDataFrame
object in R, we can use the readOGR
function (also available in the rgdal
package). When reading vector layers, we need to provide only the dsn
and layer
arguments (the path and layer name, respectively). In addition, we can specify that we do not want character values in the attribute table to be converted to factors by specifying stringsAsFactors=FALSE
(see Chapter 3, Working with Tables):
> airports = readOGR("C:\Data", + "airports", + stringsAsFactors = FALSE)
Obviously, all the airports.*
files that constitute the Shapefile need to be found in the same directory (in this case, in the C:Data
directory) for the layer to be properly read.
Line and polygon layers are more complex than point layers in two principal aspects. First, individual lines or polygons are defined with a set of points, rather than a single one. Specifically, a line is a set of points connected to each other from the first point to the last one. A polygon is defined similarly; just that the last point is equal to the first. The number of points in each line or polygon is not fixed, but dependable on the shape complexity (a more complex shape will be defined with more points). Secondly, a single line or polygon feature (corresponding to a single entry in an attribute table) can be composed of more than one individual line or more than one individual polygon. For example, in a layer of countries' boundaries, the USA feature will be composed of more than one polygon since the contiguous United States, Alaska, and each of the Hawaii Islands are separate from one another.
An additional complication, specific to polygonal layers, is the existence of holes. A single feature can be composed of an external boundary polygon and hole polygons. For example, in a layer of North America's land area, hole polygons may represent the Great Lakes. Hole polygons can once again have internal polygons (such as islands inside the Great Lakes) and so on.
Due to their complexity, it is much less common to manually define line and polygon layers from raw coordinates. Instead, they are usually either imported from an external source (for example, reading a Shapefile) or created from another object (for example, contour lines created based on a DEM raster).
As an example for SpatialLinesDataFrame
, we will read a GPS Exchange Format (GPX) file with a track record from a GPS device. With GPX files, unlike with Shapefiles, the dsn
argument is the file itself, while the layer
argument points to the data component we would like to read (such as "tracks"
or "track_points"
; see ?readOGR
). The particular file GPS_log.gpx
contains the GPS log, from driving around the Lahav and Dvira forests with the GPS device in recording mode. Let's read the file as follows:
> track = readOGR("C:\Data\GPS_log.gpx","tracks")
Using the class
function, we can demonstrate that track
is a SpatialLinesDataFrame
object:
> class(track) [1] "SpatialLinesDataFrame" attr(,"package") [1] "sp"
As an example of polygonal data, we are going to read another external file; this time it's a Shapefile named USA_2_GADM_fips
. The file is composed of polygons corresponding to the second level administrative division (counties) of the USA, with several attributes such as county names and Federal Information Processing Standards (FIPS) codes. We will assign the resulting SpatialPolygonsDataFrame
to an object named county
to use in the subsequent examples in this chapter:
> county = readOGR("C:\Data", "USA_2_GADM_fips", + stringsAsFactors = FALSE)
To summarize, so far we have witnessed two ways to create spatial vector layers in R: from a set of (x,y) coordinates (relevant to points) and from a file (relevant to points, lines, and polygons). A third way—deriving vector layers from raster data—will be presented in Chapter 7, Combining Vector and Raster Datasets.