In this chapter, we move on to the realm of spatial data analysis in R. We begin by introducing the properties and usage principles of the classes used to store raster data in R. For that matter, we are going to first introduce the simpler (nonspatial) structures that are conceptually related to rasters: matrices and arrays. We then cover the more sophisticated classes defined in the raster
package to represent spatial raster data. You will learn to create, subset, and save objects of these classes as well as to query the characteristics of rasters we have at hand. Afterwards, you will learn two basic operations involving rasters: overlay and reclassification. At the same time, we will see some examples of visualizing raster data in R to help us get a better understanding of the data we have.
In this chapter, we'll cover the following topics:
raster
packageA raster is essentially a matrix with spatial reference information. Similarly, a multiband raster is essentially a three-dimensional array with spatial reference information. Therefore, before proceeding with spatial rasters, we will cover some prerequisite material on working with these (simpler) objects in this section—matrices and arrays. Moreover, as we shall see later, matrices and arrays are common data structures with many uses in R.
A matrix
object is a two-dimensional collection of elements, all of the same type (as opposed to a data.frame
object; see the previous chapter), where the number of elements in all rows (and, naturally, all columns) is identical. Matrix objects have many uses in R. For example, certain functions take matrices as their arguments (such as the focal
function to filter rasters) or return matrices (such as the extract
function to extract raster values; we will meet both these functions in the subsequent chapters).
A matrix
object can be created with the matrix
function by specifying its values (in the form of a vector) and dimensions as follows:
> matrix(1:6, ncol = 3) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6
The first four parameters of the matrix
function are as follows:
The nrow
and ncol
parameters determine the number of rows and columns, respectively. We can specify either one of these parameters, and the other will be calculated taking into account the overall number of elements. Let's take a look at the following example:
> matrix(1:6, nrow = 3) [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 > matrix(1:6, nrow = 2) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6
Note that when the allocated number of cells is smaller or larger than the number of values in the vector that is being used to populate the matrix, the vector is either deprecated or recycled, respectively. Let's take a look at the following examples:
> matrix(12:1, ncol = 4, nrow = 2) [,1] [,2] [,3] [,4] [1,] 12 10 8 6 [2,] 11 9 7 5 > matrix(12:1, ncol = 4, nrow = 4) [,1] [,2] [,3] [,4] [1,] 12 8 4 12 [2,] 11 7 3 11 [3,] 10 6 2 10 [4,] 9 5 1 9
There are several useful functions to examine the properties of a matrix. You are familiar with them from Chapter 1, The R Environment, and Chapter 2, Working with Vectors and Time Series, since they are analogous to the functions we used with vectors and data.frame
objects. For example, the length
function returns the number of elements a matrix has as follows:
> x = matrix(7:12, ncol = 3, byrow = TRUE) > x [,1] [,2] [,3] [1,] 7 8 9 [2,] 10 11 12 > length(x) [1] 6
The nrow
and ncol
functions return the number of rows and columns as follows:
> nrow(x) [1] 2 > ncol(x) [1] 3
The dim
function returns both (the number of rows and columns) at the same time:
> dim(x) [1] 2 3
Using the as.vector
function, we can convert a matrix into a vector as follows (note that the values in the vector will always be ordered by columns):
> as.vector(x) [1] 7 10 8 11 9 12
Similar to what we saw regarding data.frame
objects, we can subset matrices using two-dimensional indices. For example, to get the values that occupy the first and third columns in matrix x
, we will use the following expression:
> x[, c(1,3)] [,1] [,2] [1,] 7 9 [2,] 10 12
To get the values that occupy the second row in matrix x
, we will use the following expression:
> x[2, ] [1] 10 11 12
The previous example demonstrates that the resulting object is simplified to a vector if the values are retrieved from a single row or column. Setting the drop
parameter to FALSE
will suppress this behavior, similar to what we saw for the data.frame
objects (see the previous chapter):
> x[2, , drop = FALSE] [,1] [,2] [,3] [1,] 10 11 12
The assignment of new values to subsets of a given matrix is also possible using the assignment operator. For example, we can create an empty 3 x 3 matrix m
and then populate some of its cells as follows:
> m = matrix(NA, ncol = 3, nrow = 3) > m [,1] [,2] [,3] [1,] NA NA NA [2,] NA NA NA [3,] NA NA NA > m[2:3, 1:2] = matrix(1:4, nrow = 2) > m [,1] [,2] [,3] [1,] NA NA NA [2,] 1 3 NA [3,] 2 4 NA
We can also use the apply
function to make calculations on rows or columns of a matrix, in exactly the same way as with the data.frame
objects (see the previous chapter). For example, we can calculate the means of all columns in matrix x
as follows:
> apply(x, 2, mean) [1] 8.5 9.5 10.5
In fact, there are two specialized functions named rowMeans
and colMeans
for the specific tasks of calculating row and column means, respectively. Thus, for example, the following expression gives exactly the same result as the previous one:
> colMeans(x) [1] 8.5 9.5 10.5
While vectors are used to represent one-dimensional sets of elements (see Chapter 2, Working with Vectors and Time Series), and matrix
is a specialized class to represent two-dimensional sets of elements (see the previous section), the array
class is more general. It is used to represent sets of elements having any number of dimensions (including one and two).
We can create an array
object (a three-dimensional one, for example) using the array
function:
> y = array(1:24, c(2,2,3)) > y , , 1 [,1] [,2] [1,] 1 3 [2,] 2 4 , , 2 [,1] [,2] [1,] 5 7 [2,] 6 8 , , 3 [,1] [,2] [1,] 9 11 [2,] 10 12
The first argument we entered (1:24
) defined the values, while the second argument (c(2,2,3)
) defined the number of dimensions and their lengths. As opposed to creating a matrix with the matrix
function, we need to explicitly specify the lengths of all dimensions (or else a one-dimensional object will be created by default) with the array
function. In the previous example, we were interested in having three dimensions—two rows, two columns, and three layers (using raster terminology; see the following section). Thus, we specified their lengths as (2,2,3) using a vector of length 3.
Naturally, a three-dimensional array has a three-dimensional indexing system. For example, we can reach the (2,1,3) element in our array y
as follows:
> y[2,1,3] [1] 10
Working with arrays is very similar to working with vectors and matrices, and the application of many of the functions we have previously seen is intuitive. For example, we can use the apply
function to find the means of all elements in each layer (or third dimension):
> apply(y, 3, mean) [1] 2.5 6.5 10.5
We will see an example involving the rowMeans
function and three-dimensional array objects in Chapter 6, Modifying Rasters and Analyzing Raster Time Series.