Chapter 3

Time on the Horizontal Axis

The most frequent visualization method of a time series uses the horizontal axis to depict the time index. This chapter illustrates several variants to display multivariate time series: multiple time series with different scales, variables with the same scale, and stacked graphs.

3.1 Time Graph of Different Meteorological Variables

There is a variety of scientific research interested in the relationship among several meteorological variables. A suitable approach is to display the time evolution of all of them using a panel for each of the variables. The superposition of variables with different characteristics is not very useful (unless their values were previously rescaled), so this option is postponed for Section 3.2.

For this example we will use the 8 years of daily data from the SIAR meteorological station located at Aranjuez (Madrid). This multivariate time series can be displayed with the xyplot method of lattice for zoo objects with a panel for each variable (Figure 3.1).

Figure 3.1

Figure showing time plot of the collection of meteorological time series of the Aranjuez station (lattice version).

Time plot of the collection of meteorological time series of the Aranjuez station (lattice version).

load(’data/aranjuez.RData’)
library(zoo)
## The layout argument arranges panels in rows
xyplot(aranjuez, layout=c(1, ncol(aranjuez)))

The package ggplot2 provides the generic method autoplot to automate the display of certain classes with a simple command. The package zoo provides an autoplot method for the zoo class with a result similar to that obtained with xyplot (Figure 3.2).

Figure 3.2

Figure showing time plot of the collection of meteorological time series of the Aranjuez station (ggplot2 version).

Time plot of the collection of meteorological time series of the Aranjuez station (ggplot2 version).

autoplot(aranjuez) + facet_free()

3.1.1 imageAnnotations to Enhance the Time Graph

These first attempts can be improved with a custom panel function that generates the content of each panel using the information processed by xyplot, or overlaying additional layers with autoplot. One of the main enhancements is to highlight certain time regions that fulfill certain conditions. The package latticeExtra provides a nice solution for xyplot with panel.xblocks. The result is displayed in Figure 3.3:

Figure 3.3

Figure showing enhanced time plot of the collection of meteorological time series of the Aranjuez station.

Enhanced time plot of the collection of meteorological time series of the Aranjuez station.

  • The label of each time series is displayed with text inside each panel instead of using the strips mechanism. The panel.text prints the name of each variable with the aid of panel.number.
  • The alternating of years is displayed with blocks of gray and white color using the panel.xblocks function from latticeExtra. The year is extracted (as character) from the time index of the zoo object with format.POSIXlt.
  • Those values below the mean of each variable are highlighted with short red color blocks at the bottom of each panel, again with the panel.xblocks function.
  • The maxima and minima are highlighted with small blue triangles.

Because the functions included in the panel function are executed consecutively, their order determines the superposition of graphical layers.

library(grid)
library(latticeExtra)

## Auxiliary function to extract the year value of a POSIXct time
## index
Year <- function(x)format(x, “%Y”)

xyplot(aranjuez, layout=c(1, ncol(aranjuez)), strip=FALSE,
      scales=list(y=list(cex=0.6, rot=0)),
      panel=function(x, y, ...){
        ## Alternation of years
        panel.xblocks(x, Year,
                   col = c(“lightgray”, “white”),
                   border = “darkgray”)
        ## Values under the average highlighted with red regions
        panel.xblocks(x, y<mean(y, na.rm=TRUE),
                   col = “indianred1”,
                   height=unit(0.1, ’npc’))
        ## Time series
        panel.lines(x, y, col=’royalblue4’, lwd=0.5, ...)
        ## Label of each time series
        panel.text(x[1], min(y, na.rm=TRUE),
                 names(aranjuez)[panel.number()],
                 cex=0.6, adj=c(0, 0), srt=90, ...)
        ## Triangles to point the maxima and minima
        idxMax <- which.max(y)
        panel.points(x[idxMax], y[idxMax],
                  col=’black’, fill=’lightblue’, pch=24)
        idxMin <- which.min(y)
        panel.points(x[idxMin], y[idxMin],
                  col=’black’, fill=’lightblue’, pch=25)
       })

There is no equivalent panel.xblocks function that can be used with ggplot2. Therefore, the ggplot2 version must explicitly compute the corresponding bands (years and regions below the average values):

  • The first step in working with ggplot is to transform the zoo object into a data.frame in long format. fortify returns a data.frame with three columns: the time Index, a factor indicating the Series, and the corresponding Value.

    timeIdx <- index(aranjuez)
    
    
    long <- fortify(aranjuez, melt=TRUE)
  • The bands of values below the average can be easily extracted with scale because these regions are negative when the data.frame is centered.

    ## Values below mean are negative after being centered
    scaled <- fortify(scale(aranjuez, scale=FALSE), melt=TRUE)
    ## The ’scaled’ column is the result of the centering.
    ## The new ’Value’ column store the original values.
    scaled <- transform(scaled, scaled=Value, Value=long$Value)
    underIdx <- which(scaled$scaled <= 0)
    ## ’under’ is the subset of values below the average
    under <- scaled[underIdx,]
  • The years bands are defined with the function endpoints from the xts package:

    library(xts)
    ep <- endpoints(timeIdx, on=’years’)
    N <- length(ep[-1])
    ## ’tsp’ is start and ’tep’ is the end of each band
    tep <- timeIdx[ep]
    tsp <- timeIdx[ep[-(N+1)]+1]
    ## ’cols’ is a vector with the color of each band
    cols <- rep_len(c(’gray’, ’white’), N)
  • The minima and maxima points of each variable are extracted with apply:

    minIdx <- timeIdx[apply(aranjuez, 2, which.min)]
    minVals <- apply(aranjuez, 2, min, na.rm=TRUE)
    mins <- data.frame(Index=minIdx,
                   Value=minVals,
                   Series=names(aranjuez))
    
    
    maxIdx <- timeIdx[apply(aranjuez, 2, which.max)]
    maxVals <- apply(aranjuez, 2, max, na.rm=TRUE)
    maxs <- data.frame(Index=maxIdx,
                      Value=maxVals,
                      Series=names(aranjuez))
  • With ggplot we define the canvas, and the layers of information are added successively:

    ggplot(data=long, aes(Index, Value)) +
       ## Time series of each variable
       geom_line(colour = “royalblue4”, lwd = 0.5) +
       ## Year bands
       annotate(geom=’rect’, ymin = -Inf, ymax = Inf,
               xmin=tsp, xmax=tep,
               fill = cols, alpha = 0.4) +
       ## Values below average
       geom_rug(data=under,
              sides=’b’, col=’indianred1’) +
       ## Minima
       geom_point(data=mins, pch=25) +
       ## Maxima
       geom_point(data=maxs, pch=24) +
       ## Axis labels and theme definition
       labs(x=’Time’, y=NULL) +
       theme_bw() +
       ## Each series is displayed in a different panel with an
       ## independent y scale
       facet_free()

Some messages from Figure 3.3:

  • The radiation, temperature, and evotranspiration are quasi-periodic and are almost synchronized between them. Their local maxima appear in the summer and the local minima in the winter. Obviously, the summer values are higher than the average.
  • The average humidity varies in oposition to the temperature and radiation cycle, with local maxima located during winter.
  • The average and maximum wind speed, and rainfall vary in a more erratic way and do not show the evident periodic behavior of the radiation and temperature.
  • The rainfall is different from year to year. The remaining variables do not show variations between years.
  • The fluctuations of solar radiation are more apparent than the temperature fluctuations. There is hardly any day with temperatures below the average value during summer, while it is not difficult to find days with radiation below the average during this season.

3.2 Time Series of Variables with the Same Scale

As an example of time series of variables with the same scale, we will use measurements of solar radiation from different meteorological stations.

The first attempt to display this multivariate time series makes use of the xyplot.zoo method. The objective of this graphic is to display the behavior of the collection as a whole: the series are superposed in the same panel (superpose=TRUE) without legend (auto.key=TRUE), using thin lines and partial transparency1. Transparency softens overplotting problems and reveals density clusters because regions with more overlapping lines are darker. Figure 3.4 displays the variations around the time average (avRad).

Figure 3.4

Figure showing time plot of the variations around time average of solar radiation measurements from the meteorological stations of Navarra.

Time plot of the variations around time average of solar radiation measurements from the meteorological stations of Navarra.

load(’data/navarra.RData’)
avRad <- zoo(rowMeans(navarra, na.rm=1), index(navarra))
pNavarra <- xyplot(navarra - avRad,
               superpose=TRUE, auto.key=FALSE,
               lwd=0.5, alpha=0.3, col=’midnightblue’)
pNavarra

This result can be improved with different methods: the cut-and-stack method, the horizon graph with horizonplot, and dynamic labeling with the gridSVG package.

3.2.1 Aspect Ratio and Rate of Change

When a graphic is intended to inform about the rate of change, special attention must be paid to the aspect ratio of the graph, defined as the ratio of the height to the width of the graphical window. Cleveland analyzed the importance of the aspect ratio for judging rate of change. He concluded that we visually decode the information about the relative local rate of change of one variable with another by comparing the orientations of the local line segments that compose the polylines. The recommendation is to choose the aspect ratio so that the absolute values of the orientations of the segments are centered on 45° (banking to 45°).

The problem with banking to 45° is that the resulting aspect ratio is frequently too small. A suitable solution to minimize wasted space is the cut-and-stack method. The xyplot.ts method implement this solution with the combination of the arguments aspect and cut. The version of Figure 3.4 using banking to 45° and the cut-and-stack method is produced with

xyplot(navarra - avRad,
      aspect=’xy’, cut=list(n=3, overlap=0.1),
      strip=FALSE,
      superpose=TRUE, auto.key=FALSE,
      lwd=0.5, alpha=0.3, col=’midnightblue’)

3.2.2 The Horizon Graph

The horizon graph is useful in examining how a large number of series changes over time, and does so in a way that allows both comparisons between the individual time series and and independent analysis of each series. Moreover, extraordinary behaviors and predominant patterns are easily distinguished (J. Heer, Kong, and Agrawala 2009; Few 2008).

This graph displays several stacked series collapsing the y-axis to free vertical space:

  • Positive and negative values share the same vertical space. Negative values are inverted and placed above the reference line. Sign is encoded using different hues (positive values in blue and negative values in red).

  • Differences in magnitude are displayed as differences in color intensity (darker colors for greater differences).

    Figure 3.5

    Figure showing cut-and-stack plot with banking to 45°.

    Cut-and-stack plot with banking to 45°.

  • The color bands share the same baseline and are superposed, with darker bands in front of the ligther ones.

Because the panels share the same design structure, once this technique is understood, it is easy to establish comparisons or spot extraordinary events. This method is what Tufte described as small multiples (Tufte 1990).

Figure 3.6 displays the variations of solar radiation around the time average with an horizon graph using a row for each time series.

Figure 3.6

Figure showing horizon plot of variations around time average of solar radiation measurements from the meteorological stations of Navarra.

Horizon plot of variations around time average of solar radiation measurements from the meteorological stations of Navarra.

library(latticeExtra)

horizonplot(navarra-avRad,
          layout=c(1, ncol(navarra)),
          origin=0, colorkey=TRUE)

Figure 3.6 allows several questions to be answered:

  • Which stations consistently measure above and below the average?
  • Which stations resemble more closely the average time series?
  • Which stations show erratic and uniform behavior?
  • In each of the stations, is there any day with extraordinary measurements?
  • Which part of the year is associated with more intense absolute fluctuations across the set of stations?

3.2.3 Time Graph of the Differences between a Time Series and a Reference

The horizon graph is also useful in revealing the differences between a univariate time series and another reference. For example, we might be interested in the departure of the observed temperature from the long-term average, or in other words, the temperature change over time.

Let’s illustrate this approach with the time series of daily average temperatures measured at the meteorological station of Aranjuez. The reference is the long-term daily average calculated with ave.

Ta <- aranjuez$TempAvg
timeIndex <- index(aranjuez)
longTa <- ave(Ta, format(timeIndex, ’%j’))
diffTa <- (Ta - longTa)

The temperature time series, the long-term average and the differences between them can be displayed with the xyplot method, now using screens to use a different panel for the differences time series (Figure 3.7)

Figure 3.7

Figure showing daily temperature time series, its long-term average and the differences between them.

Daily temperature time series, its long-term average and the differences between them.

xyplot(cbind(Ta, longTa, diffTa),
      col=c(’darkgray’, ’red’, ’midnightblue’),
      superpose=TRUE, auto.key=list(space=’right’),
      screens=c(rep(’AverageTemperature’, 2), ’Differences’))

The horizon graph is better suited for displaying the differences. The next code again uses the cut-and-stack method (Figure 3.5) to distinguish between years. Figure 3.8 shows that 2004 started clearly above the average while 2005 and 2009 did the contrary. Year 2007 was frequently below the long-term average but 2011 was more similar to that reference.

Figure 3.8

Figure showing horizon graph displaying differences between a daily temperature time series and its long-term average.

Horizon graph displaying differences between a daily temperature time series and its long-term average.

years <- unique(format(timeIndex, ’%Y’))

horizonplot(diffTa, cut=list(n=8, overlap=0),
          colorkey=TRUE, layout=c(1, 8),
          scales=list(draw=FALSE, y=list(relation=’same’)),
          origin=0, strip.left=FALSE) +
   layer(grid.text(years[panel.number()], x = 0, y = 0.1,
                gp=gpar(cex=0.8),
                just = “left”))

A different approach to display this information is to produce a level plot displaying the time series using parts of its time index as independent and conditioning variables.2 The following code displays the differences with the day of month on the horizontal axis and the year on the vertical axis, with a different panel for each month number. Therefore, each cell of Figure 3.9 corresponds to a certain day of the time series. If you compare this figure with the horizon plot, you will find the same previous findings but revealed now in more detail. On the other hand, while the horizon plot of Figure 3.8 clearly displays the yearly evolution, the combination of variables of the level plot focuses on the comparison between years in a certain month.

year <- function(x)as.numeric(format(x, ’%Y’))
day <- function(x)as.numeric(format(x, ’%d’))
month <- function(x)as.numeric(format(x, ’%m’))
myTheme <- modifyList(custom.theme(region=brewer.pal(9, ’RdBu’)),
                               list(
                                strip.background=list(col=’gray’),
                                panel.background=list(col=’gray’)))
maxZ <- max(abs(diffTa))
levelplot(diffTa ~ day(timeIndex) * year(timeIndex) | factor(month(
    timeIndex)),
        at=pretty(c(-maxZ, maxZ), n=8),
        colorkey=list(height=0.3),
        layout=c(1, 12), strip=FALSE, strip.left=TRUE,
        xlab=’Day’, ylab=’Month’,
        par.settings=myTheme)

3.2.4 imageInteraction with gridSVG

The gridSVG package provides functions to convert grid-based R graphics to an SVG format. It provides several functions to add dynamic and interactive capabilities to R graphics. In this section we will use grid.script, a function to add JavaScript code to a plot.

The first step is to specify which component of the scene will run the JavaScript code. The grid.ls function returns a listing of the names of grobs or viewports included in the graphic output: only the lines will be connected with the JavaScript code.

Figure 3.9

Figure showing level plot of differences between a daily temperature time series and its long-term average.

Level plot of differences between a daily temperature time series and its long-term average.

library(gridSVG)
## grobs in the graphical output
pNavarra
grobs <- grid.ls(print=FALSE)
## only interested in some of them
nms <- grobs$name[grobs$type == “grobListing”]
idxNames <- grep(’lines’, nms)
IDs <- nms[idxNames]

The second step is to modify each grob (graphical object) to add attributes that specify when it will call JavaScript code. For each line identified with the elements of the IDs vector and associated to a meteorological station, the navarra object is accessed to extract the annual mean value of the daily radiation and the abbreviated name of the corresponding station (info). The grid.garnish function adds attributes to the grob of each line so that when the mouse moves over a grob, the line is highlighted and colored in red (highlight). When the mouse hovers out of the grob, the hide function sets back the default values of line width and transparency, but uses the green color to denote that this line has been already visited. In addition, because the browsers display the content of the title attribute with a default tooltip, grid.garnish sets this attribute to info.

for (id in unique(IDs)){
  ## extract information from the data
  ## according to the ID value
  i <- strsplit(id, ’\.’)
  i <- sapply(i, function(x)as.numeric(x[5]))
  ## Information to be attached to each line: annual mean of daily
  ## radiation and abbreviated name of the station
  dat <- round(mean(navarra[,i], na.rm=TRUE), 2)
  info <- paste(names(navarra)[i], paste(dat, collapse=’,’),
             sep=’:’)
  ## attach SVG attributes
  grid.garnish(id,
            onmouseover=“highlight(evt)”,
            onmouseout=“hide(evt)”,
            title=info)
}

These JavaScript functions are included in a script file named highlight.js (available at the website of the book). It can be added as an additional object with grid.script.

grid.script(filename=“highlight.js”)

This script is easy to understand, even without previous JavaScript knowledge:

highlight = function(evt){’,
    evt.target.setAttribute(’opacity’, ’1’);
    evt.target.setAttribute(’stroke’, ’red’);
    evt.target.setAttribute(’stroke-width’, ’1’);
}
hide = function(evt){
    evt.target.setAttribute(’opacity’, ’0.3’);
    evt.target.setAttribute(’stroke’, green’);
    evt.target.setAttribute(’stroke-width’, ’0.3’);
}

Finally, gridToSVG exports the whole scene to SVG.

grid.export(’figs/navarraRadiation.svg’)

A snapshot of the result, as viewed in a browser with a line highlighted, is shown in Figure 3.10. Open the SVG file with your browser, explore it using the horizon graph (Figure 3.6) as a reference, and try to answer the questions raised with that graphic.

Figure 3.10

Figure showing snapshot of an SVG graphic produced with gridSVG.

Snapshot of an SVG graphic produced with gridSVG.

3.3 Stacked Graphs

If the variables of a multivariate time series can be summed to produce a meaningful global variable, they may be better displayed with stacked graphs. For example, the information on unemployment in the United States provides data of unemployed persons by industry and class of workers, and can be summed to give a total unemployment time series.

load(’data/unemployUSA.RData’)

The time series of unemployment can be directly displayed with the xyplot.zoo method (Figure 3.11).

Figure 3.11

Figure showing time series of unemployment with xyplot using the default panel function.

Time series of unemployment with xyplot using the default panel function.

xyplot(unemployUSA, superpose=TRUE, par.settings=custom.theme,
     auto.key=list(space=’right’))

This graphical output is not very useful: the legend is confusing, with too many items; the vertical scale is dominated by the largest series, with several series buried in the lower part of the scale; the trend, variations and structure of the total and individual contributions cannot be deduced from this graph.

A suitable improvement is to display the multivariate time series as a set of stacked colored polygons to follow the macro/micro principle proposed by Tufte (Tufte 1990): Show a collection of individual time series and also display their sum. A traditional stacked graph is easily obtained with geom_area:

library(scales) ## scale_x_yearmon needs scales::pretty_breaks
autoplot(unemployUSA, facets=NULL, geom=’area’) +
   geom_area(aes(fill=Series)) +
   scale_x_yearmon()

Traditional stacked graphs have their bottom on the x-axis which makes the overall height at each point easy to estimate. On the other hand, with this layout, individual layers may be difficult to distinguish. The ThemeRiver (Havre et al. 2002) (also named streamgraph in (Byron and Wattenberg 2008)) provides an innovative layout method in which layers are symmetrical around the x-axis at their center. At a glance, the pattern of the global sum and individual variables, their contribution to conform the global sum, and the interrelation between variables can be perceived.

Figure 3.12

Figure showing time series of unemployment with stacked areas using geom_area.

Time series of unemployment with stacked areas using geom_area.

I have defined a panel and prepanel functions3 to implement a ThemeRiver with xyplot. The result is displayed in Figure 3.13 with a vertical line to indicate one of main milestones of the financial crisis, whose effect on the overall unemployment results is clearly evident.

Figure 3.13

Figure showing themeRiver of unemployment in the United States.

ThemeRiver of unemployment in the United States.

library(colorspace)
## We will use a qualitative palette from colorspace
nCols <- ncol(unemployUSA)
pal <- rainbow_hcl(nCols, c=70, l=75, start=30, end=300)
myTheme <- custom.theme(fill=pal, lwd=0.2)

sep2008 <- as.numeric(as.yearmon(’2008-09’))

xyplot(unemployUSA, superpose=TRUE, auto.key=FALSE,
      panel=panel.flow, prepanel=prepanel.flow,
      origin=’themeRiver’, scales=list(y=list(draw=FALSE)),
      par.settings=myTheme) +
   layer(panel.abline(v=sep2008, col=’gray’, lwd=0.7))

This figure can help answer several questions. For example:

  • What is the industry or class of worker with the lowest/highest unemployment figures during this time period?
  • What is the industry or class of worker with the lowest/highest unemployment increases due to the financial crisis?
  • There are a number of local maxima and minima of the total unemployment numbers. Are all the classes contributing to the maxima/minima? Do all the classes exhibit the same fluctuation behavior as the global evolution?

More questions and answers can be found in the “Current Employment Statistics” reports from the Bureau of Labor Statistics4.

3.3.1 imagePanel and Prepanel Functions to Implement the ThemeRiver with xyplot

The xyplot function displays information according to the class of its first argument (methods) and to the panel function. We will use the xyplot.zoo method (equivalent to the xyplot.ts method) with a new custom panel function. This new panel function has four main arguments, three of them calculated by xyplot (x, y and groups) and a new one, origin. Of course, it includes the ... argument to provide additional arguments.

The first step is to create a data.frame with coordinates and with the groups factor. The value and number of the levels will be used in the main step of this panel function. With this data.frame we have to calculate the y and x coordinates for each group to get a stacked set of polygons.

This data.frame is in the long format, with a row for each observation, and where the group column identifies the variable. Thus, it must be transformed to the wide format, with a column for each variable. With the unstack function, a new data.frame is produced whose columns are defined according to the formula y ~ groups and with a row for each time position. The stack of polygons is the result of the cumulative sum of each row (apply(yWide, 1, cumsum)). The origin of this sum is defined with the corresponding origin argument: with themeRiver, the polygons are arranged in a symmetric way.

Each column of this matrix of cumulative sums defines the y coordinate of each variable (where origin is now the first variable). The polygon of each variable is between this curve (iCol+1) and the one of the previous variable (iCol). In order to get a closed polygon, the coordinates of the inferior limit are in reverse order. This new data.frame (Y) is in the wide format, but xyplot requires the information in the long format: the y coordinates of the polygons are extracted from the values column of the long version of this data.frame.

The x coordinates are produced in an easier way. Again, unstack produces a data.frame with a column for each variable and a row for each time position, but now, because the x coordinates are the same for the set of polygons, the corresponding vector is constructed directly using a combination of concatenation and repetition.

Finally, the groups vector is produced, repeating each element of the columns of the original data.frame (dat$groups) twice to account for the forward and reverse curves of the corresponding polygon.

The final step before displaying the polygons is to acquire the graphical settings. The information retrieved with trellis.par.get is transferred to the corresponding arguments of panel.polygon.

Everything is ready for constructing the polygons. With a for loop, the coordinates of the corresponding group are extracted from the x and y vectors, and a polygon is displayed with panel.polygon. The labels of each polygon (the levels of the original groups variable, groupLevels) are printed inside the polygon if there is enough room for the text (hChar>1) or at the right if the polygon is too small, or if it is the first or last variable of the set. Both the polygons and the labels share the same color (col[i]).

panel.flow <- function(x, y, groups, origin, ...){
  dat <- data.frame(x=x, y=y, groups=groups)
  nVars <- nlevels(groups)
  groupLevels <- levels(groups)

  ## From long to wide
  yWide <- unstack(dat, y~groups)
  ## Where are the maxima of each variable located? We will use
  ## them to position labels.
  idxMaxes <- apply(yWide, 2, which.max)
  ##Origin calculated following Havr.eHetzler.ea2002
  if (origin==’themeRiver’) origin= -1/2*rowSums(yWide)
  else origin=0
  yWide <- cbind(origin=origin, yWide)
  ## Cumulative sums to define the polygon
  yCumSum <- t(apply(yWide, 1, cumsum))
  Y <- as.data.frame(sapply(seq_len(nVars),
                       function(iCol)c(yCumSum[,iCol+1],
                                    rev(yCumSum[,iCol]))))
  names(Y) <- levels(groups)
  ## Back to long format, since xyplot works that way
  y <- stack(Y)$values

  ## Similar but easier for x
  xWide <- unstack(dat, x~groups)
  x <- rep(c(xWide[,1], rev(xWide[,1])), nVars)
  ## Groups repeated twice (upper and lower limits of the polygon)
  groups <- rep(groups, each=2)

  ## Graphical parameters
  superpose.polygon <- trellis.par.get(“superpose.polygon”)
  col = superpose.polygon$col
  border = superpose.polygon$border
  lwd = superpose.polygon$lwd

  ## Draw polygons
  for (i in seq_len(nVars)){
    xi <- x[groups==groupLevels[i]]
    yi <- y[groups==groupLevels[i]]
    panel.polygon(xi, yi, border=border,
               lwd=lwd, col=col[i])
  }

  ## Print labels
  for (i in seq_len(nVars)){
    xi <- x[groups==groupLevels[i]]
    yi <- y[groups==groupLevels[i]]
    N <- length(xi)/2
    ## Height available for the label
    h <- unit(yi[idxMaxes[i]], ’native’) -
      unit(yi[idxMaxes[i] + 2*(N-idxMaxes[i]) +1], ’native’)
    ##...converted to “char” units
    hChar <- convertHeight(h, ’char’, TRUE)
    ## If there is enough space and we are not at the first or
    ## last variable, then the label is printed inside the polygon.
    if((hChar >= 1) && !(i %in% c(1, nVars))){
     grid.text(groupLevels[i],
             xi[idxMaxes[i]],
             (yi[idxMaxes[i]] +
              yi[idxMaxes[i] + 2*(N-idxMaxes[i]) +1])/2,
             gp = gpar(col=’white’, alpha=0.7, cex=0.7),
             default.units=’native’)
    } else {
      ## Elsewhere, the label is printed outside

      grid.text(groupLevels[i],
              xi[N],
              (yi[N] + yi[N+1])/2,
              gp=gpar(col=col[i], cex=0.7),
              just=’left’, default.units=’native’)
    }
  }
}

With this panel function, xyplot displays a set of stacked polygons corresponding to the multivariate time series (Figure 3.14). However, the graphical window is not large enough, and part of the polygons fall out of it. Why?

Figure 3.14

Figure showing first attempt of ThemeRiver.

First attempt of ThemeRiver.

xyplot(unemployUSA, superpose=TRUE, auto.key=FALSE,
      panel=panel.flow, origin=’themeRiver’,
      par.settings=myTheme, cex=0.4, offset=0,
      scales=list(y=list(draw=FALSE)))

The problem is that lattice makes a preliminary estimate of the window size using a default prepanel function that is unaware of the internal calculations of our new panel.flow function. The solution is to define a new prepanel.flow function.

The input arguments and first lines are the same as in panel.flow. The output is a list whose elements are the limits for each axis (xlim and ylim), and the sequence of differences (dx and dy) that can be used for the aspect and banking calculations.

The limits of the x-axis are defined with the range of the time index, while the limits of the y-axis are calculated with the minimum of the first column of yCumSum (the origin line) and with the maximum of its last column (the upper line of the cumulative sum).

prepanel.flow <- function(x, y, groups, origin,...){
  dat <- data.frame(x=x, y=y, groups=groups)
  nVars <- nlevels(groups)
  groupLevels <- levels(groups)
  yWide <- unstack(dat, y~groups)
  if (origin==’themeRiver’) origin= -1/2*rowSums(yWide)
  else origin=0
  yWide <- cbind(origin=origin, yWide)
  yCumSum <- t(apply(yWide, 1, cumsum))

  list(xlim=range(x),
      ylim=c(min(yCumSum[,1]), max(yCumSum[,nVars+1])),
      dx=diff(x),
      dy=diff(c(yCumSum[,-1])))
}

1 A similar result can be obtained with autoplot using facets=NULL.

2 This approach was inspired by the strip function of the metvurst package (http://metvurst.blogspot.com.es/2012/11/plotting-large-amounts-of-atmospheric_4.html).

3 The code of these panel and prepanel functions is explained in Section 3.3.1.

4 The March 2012 highlights report is available at http://www.bls.gov/ces/highlights032012.pdf.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset