This section is going to be devoted to the examination of spatial vector layer properties, and to subsetting them based on their attribute tables. Some of the presented procedures will be analogous to those presented for rasters in the previous chapter (for example, plotting and querying CRS information), while others are generally relevant only to vector layers (for example, calculating areas and creating subsets according to the attribute table). As will quickly become apparent, many operations involving attribute tables of vector layers are conveniently analogous to operations on data.frame
objects.
The summary
function produces a useful textual summary of the properties of a vector layer, including its class, bounding box coordinates, CRS, and attribute table column types. For example, using summary
on airports
produces the following textual output:
> summary(airports) Object of class SpatialPointsDataFrame Coordinates: min max lon -106.79467 -106.07308 lat 35.04918 35.62866 Is projected: FALSE proj4string : [+proj=longlat +datum=WGS84] Number of points: 3 Data attributes: Length Class Mode 3 character character
All of the properties listed in this output can also be accessed, and in some cases modified, using functions. For example, similar to what we already saw for rasters in the previous chapter, the proj4stpring
function returns the CRS definition of a vector layer in the PROJ.4 format. Using proj4string
on airports
returns the definition of the WGS84 CRS:
> proj4string(airports) [1] "+proj=longlat +datum=WGS84"
Referring to the geometry part, the length
function returns the number of features the layer consists of. For example, airports
contains three points (the three airports), as the following output shows:
> length(airports) [1] 3
A spatial layer also always has row names that internally serve as ID variables to match the geometries with attribute table entries. The number of row names is thus equal to the number of features:
> row.names(airports) [1] "1" "2" "3"
The dimensions
function returns the number of spatial dimensions:
> dimensions(airports) [1] 2
The attribute table of a vector layer is, in fact, a data.frame
object and some of the functions that work with data.frame
objects have been defined to consistently work directly on vector layers as well. For example, the nrow
, ncol
, and dim
functions applied to a vector layer refer to its attribute table to return its dimensions:
> nrow(county) [1] 3145 > ncol(county) [1] 4 > dim(county) [1] 3145 4
We see that the attribute table of county
has 3,145 rows (thus, the layer has 3,145 features) and four columns. The columns contain the following information:
NAME_1
: The first-level name (for example, the state name)NAME_2
: The second-level name (for example, the county name)TYPE_2
: The feature type (for example "County
" or "Water body
")FIPS
: The FIPS codeIndividual columns of an attribute table, or subsets of these, can be accessed with the $
and [
operators. For example, the second-level names (held in the NAME_2
column) of the first 10 features in county
can be obtained as follows:
> county$NAME_2[1:10] [1] "Litchfield" "Hartford" "Tolland" "Windham" [5] "Siskiyou" "Del Norte" "Modoc" "New London" [9] "Fairfield" "Middlesex"
As another example, we can check the types of features the county
layer contains by listing the unique values in the TYPE_2
column:
> unique(county$TYPE_2) [1] "County" "District" "Borough" [4] "Census Area" "Municipality" "City And Borough" [7] "City And County" "Water body" "Parish" [10] "Independent City"
The whole attribute table of a spatial vector layer can be accessed directly using the @
operator. The @
operator is used to extract a slot, by its name, from an object, using the notation object_name@slot_name
.
More specifically, the @
operator is applicable to objects of the so-called S4 classes, which all raster and vector layers we deal with are, as opposed to S3 classes whose components are accessed with a different method (using the $
operator). The distinction between S3 and S4 concerns the internal class structure and is beyond the scope of this book. For more information, refer to Advanced R, Wickham, H., CRC Press, 2014 (http://adv-r.had.co.nz/OO-essentials.html).
The attribute table slot of spatial vector classes defined in the sp
package is called data
. Therefore, adding @data
after a vector layer name will yield its attribute table (if it has one).
For example, the following expression returns the attribute table of airports
:
> airports@data name 1 Albuquerque International 2 Double Eagle II 3 Santa Fe Municipal
As another example, we can print the first few rows in the attribute table of county
using the head
function applied to county@data
:
> head(county@data) NAME_1 NAME_2 TYPE_2 FIPS 0 Connecticut Litchfield County 09005 1 Connecticut Hartford County 09003 2 Connecticut Tolland County 09013 3 Connecticut Windham County 09015 4 California Siskiyou County 06093 5 California Del Norte County 06015
As we shall see later in this chapter, the attribute table of a vector layer can also be modified using assignment, similar to a separate data.frame
object. New attribute table columns can be created and populated using the $
operator, or the whole attribute table can be modified (for example, certain columns can be deleted or joined) and reassigned to the data
slot.
All other components of spatial vector (and raster, for that matter) objects are also contained in slots and thus, are accessible with the @
operator. Using the str
function, we can obtain a tree describing the object's structure. Let's take a look at the following example:
> str(airports) Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots ..@ data :'data.frame': 3 obs. of 1 variable: .. ..$ name: chr [1:3] "Albuquerque International" "Double Eagl$ ..@ coords.nrs : int [1:2] 1 2 ..@ coords : num [1:3, 1:2] -106.6 -106.8 -106.1 35 35.2 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:2] "lon" "lat" ..@ bbox : num [1:2, 1:2] -106.8 35 -106.1 35.6 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:2] "lon" "lat" .. .. ..$ : chr [1:2] "min" "max" ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84"
Using such a tree, we can find our way to all the data components of an object. Then, why is the use of specialized functions (such as proj4string
), rather than direct access to the relevant property (such as airports@proj4string@projargs
), usually advocated in R? One reason is that working through functions makes our code more robust in the face of changes in class definition. In other words, if the internal architecture of a certain class changes in a future version of a given package (so that, for instance, the slot x
is now named y
), the user may not even notice since the code for all the relevant functions operating on the class will also be changed accordingly while access with @x
will no longer work. Accessing the attribute table of a vector layer (with @data
) is going to be the only direct access we have in this book. The exception is necessary since certain operations on an attribute table are unfeasible otherwise.
The attribute table of a vector layer can also be removed altogether, by converting a Spatial*DataFrame
object into a Spatial*
object. Such a conversion can be done with the as
function, specifying the object name and the class we want to convert it to. For example, we can convert airports
, a SpatialPointsDataFrame
object, to a SpatialPoints
object as follows:
> airports_sp = as(airports, "SpatialPoints")
Since a SpatialPoints
object does not have a data
slot, an error occurs when trying to access it:
> airports_sp@data Error: no slot of name "data" for this object of class "SpatialPo$
We can also use the as
function to perform the reverse conversion from a SpatialPoints
object to a SpatialPointsDataFrame
object. Naturally, the attribute table of the resulting object is going to be empty (since SpatialPoints
objects do not have one):
> as(airports_sp, "SpatialPointsDataFrame")@data data frame with 0 columns and 0 rows
We can subset a vector layer according to its attribute table using the same notation as in subsetting data.frame
objects. Selecting which features to retain can be done by supplying a numeric or logical vector within the [
operator.
For example, to get a subset of only those county
features that belong to the contiguous U.S., we need to exclude those features corresponding to the states of Alaska and Hawaii. This can be done by creating a logical vector (applying a condition to the county$NAME_1
column holding state names) and supplying that vector as the rows
index of county
with the [
operator, as follows:
> county = county[ + county$NAME_1 != "Alaska" & + county$NAME_1 != "Hawaii", ]
Similarly, we can retain only the land area by excluding water body polygons:
> county = county[county$TYPE_2 != "Water body", ]
Let's examine the resulting layer using the plot
function. The expression plot(county)
produces the graphical output as shown in the following screenshot:
As we can see, the plot
function, by default, draws polygon borders using black lines. In subsequent examples, we will experiment a little bit with several parameters of this function to modify the appearance of the plot.