Spatio-temporal data, such as MODIS images, time series, or meteorological records from several stations (see Chapter 3, Working with Tables) pose a challenge for analysis and visualization due to their three-dimensional nature. One approach to simplify such data is to perform aggregation in spatial and/or temporal dimensions (another approach to simplify spatio-temporal data is, for example, cluster analysis).
In this section, we will experiment with two approaches to aggregate the data held in the multiband raster r
in order to get additional perspectives on the spatio-temporal behavior of NDVI within the geographic area this raster covers.
In our first example, we will aggregate r
along the temporal dimension (that is, the layers). In fact, the previous example where we mapped the minimum and maximum values for each pixel in r
using overlay
with the range
function (see Chapter 4, Working with Rasters, for more information) is also an example of aggregation. However, instead of aggregating all layers, what if you want to learn the properties of different time series portions? In such cases, you have to perform several overlay operations, and then combine the results into a multiband raster. That is exactly what we are going to do next.
Let's say we are interested in examining the seasonal NDVI averages across the studied area. In such cases, we need to perform an overlay of all layers for each season using the mean
function, and then combine the results into a four-band raster (where each layer corresponds to one of the four seasons). Conceptually, the simplest way of doing this is with a for
loop that goes through the four seasons, selecting the appropriate layers of r
each time, applying overlay
on those layers, and keeping the result to form a final multiband raster in the end.
As a preliminary step to construct the loop, we will define a character vector with season names (seasons
) and an empty RasterStack
object (season_means
). The seasons
vector will define loop iterations, while season_means
will be used to hold the results:
> seasons = c("winter", "spring", "summer", "fall") > season_means = stack()
Now, we are ready to execute the for
loop:
> for(i in seasons) { + season_means = stack(season_means, + overlay(r[[which(dates$season == i)]], + fun = function(x) mean(x, na.rm = TRUE))) + }
The loop code section is executed four times (since the number of elements in seasons
is four), each time assigning the next season name into i
. The code adds the mean of the relevant layers in r
(selected using the dates$season
vector that we previously created; refer to Chapter 3, Working with Tables) to season_means
. Therefore, in the first loop iteration, the mean of the "winter"
layers in r
is added to the empty season_means;
in the second iteration, the mean of "spring"
is added to season_means
(which already has one layer), and so on until all four seasons are covered.
Now that the result season_means
is ready, we can add season names to the respective layers for convenience:
> names(season_means) = seasons
We will view the result using the levelplot
function:
> levelplot(season_means, par.settings=RdBuTheme, contour=TRUE)
The following screenshot shows the graphical output:
The preceding screenshot shows average NDVI for each season. We can see that NDVI is generally high during winter and spring and low during summer and fall. The image also shows that certain areas (such as the north-west corner of the image) have much higher NDVI in the wet season than in the dry one, while other areas (such as the three patches in the center of the image) maintain relatively high NDVI values in the summer months as well. The former are, in fact, areas where agricultural crops and natural herbaceous vegetation proliferate due to the relatively more abundant rainfall in the wet season, while the latter are planted evergreen pine forests where vegetation activity is maintained even in the dry season.
Another way of aggregating raster layers, bypassing the necessity to construct a loop, is with the stackApply
function. Similar to the way in which tapply
can be used to apply a function on different portions of a vector (as we discussed in Chapter 3, Working with Tables), the function stackApply
has been defined to apply a function on different portions of layers in a multiband raster. Analogous to tapply
, the main three parameters of stackApply
are as follows:
x
)indices
)fun
)Unlike tapply
, the indices vector must be composed of consecutive integers, starting with 1
. An additional parameter of stackApply
is na.rm
, which controls whether NA
values are removed from calculations.
For example, we can calculate the monthly NDVI means as follows:
> month_means = stackApply(r, + indices = dates$month, + fun = mean, + na.rm = TRUE)
As in the previous example, we will name the layers of the result month_means
:
> names(month_means) = month.abb
We can visualize the result with levelplot
:
> levelplot(month_means, par.settings = RdBuTheme, contour = TRUE)
The following screenshot shows the graphical output:
The preceding screenshot shows NDVI dynamics with greater detail on a monthly scale rather than a seasonal scale. We can see that NDVI starts to increase around October-November and declines around February-March. During April-September (the dry season), NDVI is just about constant.
Using the monthly averages, we can derive other informative products. For example, with the following overlay
function call, we can create a raster that shows the month in which the lowest NDVI value is observed at each location:
> min_month = overlay(month_means, fun = which.min)
To save space, we will not plot the result here.
As mentioned earlier, the indices
parameter in stackApply
accepts a vector of integers starting with 1
, which the dates$months
vector conveniently was. However, how should we deal with other grouping vectors—character vectors or numeric vectors—that are not consecutive or that do not start with 1
? We must first encode these as consecutive integer vectors, for example, by converting them to a factor (with the factor
function) and then extracting the factor level indices (with the as.numeric
function). Let's take a look at the following example:
> dates$year[1:30] [1] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 [13] 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001 2001 [25] 2001 2001 2001 2001 2001 2001 > as.numeric(factor(dates$year))[1:30] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
Here, we reclassified the dates$year
vector (which contains 13 unique years) to a vector of integers from 1 to 13 (only the first 30 values were printed to save space).
In our second example, we will aggregate r
along the y axis spatial dimension. Doing this will allow us to observe the average NDVI at a given position on the north-south axis over time. In other words, we will reduce the number of dimensions from three to two by averaging all values of a given y axis/date position into a single value. The following schematic diagram describes our action plan:
As we can see in the preceding diagram, our goal is to create a new raster s
, where each column holds the vector of row averages from r
. In other words, each s
row is going to represent an r
row (or y axis positions in space), while each s
column is going to represent an r
layer (or positions in time). We will perform the operation in two steps: first, creating an empty raster s
(with the required dimensions) and second, populating it with values. We will also see two different ways of accomplishing the task: using a for
loop and using apply
.
As we can see in the preceding diagram, the number of rows in s
should be equal to the number of rows in r
, while the number of columns in s
should be equal to the number of layers in r
. In addition to setting row and column numbers, using parameters xmn
, xmx
, ymn
, and ymx
, we will set the minimum and maximum coordinates on the x and y axes (to avoid the default, where the extent is -180 to 180 on the x axis and -90 to 90 on the y axis). The coordinate ranges on the x and y axes are going to be 0 to 280 and 0 to 100, respectively; moreover, since the number of columns and rows is accordingly 280 and 100, the raster resolution will be 1
(an arbitrary unit) on both axes:
> s = raster(nrows = nrow(r), ncols = nlayers(r), + xmn = 0, xmx = nlayers(r), ymn = 0, ymx = nrow(r))
Now that raster s
is ready, we need to populate it with values. Following our first for
loop method, we will define a new function called raster_rowMeans
that, when provided with a raster (x
) and the layer index (layer
), returns a vector of row means in that layer. This function actually consists of a single expression as it simply applies the base function rowMeans
on the matrix of values from a given layer, obtained with as.matrix
:
> raster_rowMeans = function(x, layer) { + rowMeans(as.matrix(x[[layer]]), + na.rm = TRUE) + }
Now, all is left to be done is to go over the layers of r
, each time calculating the row means in the current layer (with the raster_rowMeans
function) and assigning the resulting vector to the appropriate column of s
(exactly as depicted in the preceding diagram):
> for(i in 1:nlayers(r)) { + s[ ,i] = raster_rowMeans(r, i) + }
Now s
is filled with the appropriate values, and we can use the levelplot
function to display it. The additional at
parameter of the levelplot
function sets the breaks between color levels and contours. We use it to slightly reduce the quantity of contours and make the image clearer:
> levelplot(s, + par.settings = RdBuTheme, + contour = TRUE, + margin = FALSE, + at = seq(0,0.6,0.05))
The graphical output is shown in the following screenshot:
The preceding screenshot shows the average NDVI at each point in time (x axis) and north-south position in space (y axis). Two main patterns are evident: first, along the x axis (time) we can see the periodical behavior of NDVI between the interchanging wet and dry seasons. However, there are also inter-annual differences—in certain years, NDVI is higher or lower than average—due to variation in climatic conditions and therefore in vegetation activity. Second, along the y axis (the north-south position), we can see the NDVI gradient from the relatively wet conditions, and thus a higher NDVI to the north, compared to drier conditions; consequently, we can also see a lower and more stable NDVI to the south.
The second method of obtaining the same result will be defined with less code, although it might be conceptually more difficult to grasp. The first step of the task, defining the empty raster s
, is identical:
> s = raster(nrows = nrow(r), ncols = nlayers(r), + xmn = 0, xmx = nlayers(r), ymn = 0, ymx = nrow(r))
However, the second step is different. Instead of going over the layers of r
with a for
loop, we will convert r
to a three-dimensional array and utilize the apply
function. The former is accomplished with the as.array
function:
> r_array = as.array(r)
The result, r_array
, is an array
object where the first dimension corresponds to r
rows, the second dimension to r
columns, and the third dimension to r
layers. We can therefore use apply
to utilize rowMeans
on the third dimension, obtaining the vector of row means for each third-dimension element (layer), which is exactly what we need. The values of the resulting matrix can be directly transferred to s
:
> s[] = apply(r_array, 3, rowMeans, na.rm = TRUE)
The raster s
we just created is identical to the one created (and plotted) earlier. Although the same results were produced, an important difference between the two methods is in the speed of execution. Accessing one raster layer at a time using a loop is slow compared to an apply
operation on an array, which is generally very fast. For example, on the computer this book is written on, calculating the values of s
using the first method is performed in about 1.8 minutes as compared to 2.6 seconds using the second method (that is ~40 times faster!). It is therefore often useful to transform a raster into a simpler object (matrix
or array
) before doing intensive calculations.