Research doesn’t end when the last statistical analysis or graph is completed. We need to include the results in a report that effectively communicates these findings to a teacher, supervisor, client, government agency, or journal editor. Although R creates state-of-the-art graphics, its text output is woefully retro—tables of monospaced text with columns lined up using spaces.
There are two common approaches to creating publication quality reports in R: Sweave and odfWeave. The Sweave package allows you to embed R code and output in LaTeX documents, in order to produce high-end typeset reports in PDF, PostScript, and DVI formats. Sweave is an elegant, precise, and highly flexible system, but it requires the author to be conversant with LaTeX coding.
In a similar fashion, the odfWeave package provides a mechanism for embedding R code and output in documents that follow the Open Documents Format (ODF). These reports can be further edited via an ODF word processor, such as OpenOffice Writer, and saved in either ODF or Microsoft Word format. The process is not as flexible as the Sweave approach, but it eliminates the need to learn LaTeX. We’ll look at each approach in turn.
LaTeX is a document preparation system for high-quality typesetting (http://www.latex-project.org) that’s freely available for Windows, Mac, and Linux platforms. An author creates a text document that includes markup code for formatting the content. The document is then processed through a LaTeX compiler, producing a finished document in PDF, PostScript, or DVI format.
The Sweave package allows you to embed R code and output (including graphs) within the LaTeX document. This is a multistep process:
Within R, the format is
Sweave("infile.Rnw")
By default, Sweave("example.Rnw") would input the file example.Rnw from the current working directory and output the file example.tex to the same directory. Alternatively, use can use
Sweave("infile.Rnw", syntax="SweaveSyntaxNoweb")
Specifying this syntax option can help avoid some common parsing errors, as well as conflicts with the R2HTML package.
Execution from the command line will depend on the operating system. For example, on a Linux system, this might look like $ R CMD Sweave infile.Rnw
The complete process is outlined in figure D.1.
As indicated earlier, each chunk of R code is surrounded by <<>>= and @. You can add options to each <<>>= delimiter in order to control the processing of the corresponding R code chunk. For example
<<echo=TRUE, results=HIDE>>= summary(lm(Y~X, data=mydata)) @
would output the code, but not the results, whereas
<<echo=FALSE, fig=TRUE>>= plot(A) @
wouldn’t print the code but would include the graph in the output. Common delimiter options are described in table D.1.
Option |
Description |
---|---|
echo | Include the code in the output (echo=TRUE) or not (echo=FALSE). The default isTRUE. |
eval | Use eval=FALSE to keep the code from being evaluated/executed. The default is TRUE. |
fig | Use fig=TRUE when the output is a graph. The default is FALSE. |
results | Include R code output (results=verbatim), suppress the output (results=hide), or include the output and assume that it contains LaTeX markup (results=tex). The default is verbatim. Use results=tex when the output is generated by the xtable() function in the xtable package or the latex() function in the Hmisc package. |
By default, Sweave will add LaTeX markup code to attractively format data frames, matrices, and vectors. Additionally, R objects can be embedded inline using a Sexpr{} statement. Note that lattice graphs must be embedded in a print() statement to be processed properly.
The xtable() function in the xtable package can be used to format data frames and matrices more precisely. In addition, it can be used to format other R objects, including those produced by lm(), glm(), aov(), table(), ts(), and coxph(). Use method(xtable) to view a comprehensive list. When formatting R output using xtable(), be sure to include the results=tex option in the code chunk delimiter.
It’s easier to see how this all works with an example. Consider the noweb file in listing D.1. This is a reworking of the one-way ANOVA example in section 8.3. LaTeX markup code begins with a backslash (). The exception is Sexpr{}, which is a Sweave addition. R related code is presented in bold italics.
documentclass[12pt]{article} itle{Sample Report} author{Robert I. Kabacoff, Ph.D.} date{} egin{document} maketitle <<echo=false, results=hide>>= library(multcomp) library(xtable) attach(cholesterol) @ section{Results} Cholesterol reduction was assessed in a study that randomized Sexpr{nrow(cholesterol)} patients to one of Sexpr{length(unique(trt))} treatments. Summary statistics are provided in Table ef{table:descriptives}. <<echo = false, results = tex>>= descTable <- data.frame("Treatment" = sort(unique(trt)), "N" = as.vector(table(trt)), "Mean" = tapply(response, list(trt), mean, na.rm=TRUE), "SD" = tapply(response, list(trt), sd, na.rm=TRUE) ) print(xtable(descTable, caption = "Descriptive statistics for each treatment group", label = "table:descriptives"), caption.placement = "top", include.rownames = FALSE) @ The analysis of variance is provided in Table ef{table:anova}. <<echo=false, results=tex>>= fit <- aov(response ~ trt) print(xtable(fit, caption = "Analysis of variance", label = "table:anova"), caption.placement = "top") @ oindent and group differences are plotted in Figure ef{figure:tukey}. egin{figure}label{figure:tukey} egin{center} <<fig=TRUE,echo=FALSE>>= par(mar=c(5,4,6,2)) tuk <- glht(fit, linfct=mcp(trt="Tukey")) plot(cld(tuk, level=.05),col="lightgrey",xlab="Treatment", ylab="Response") box("figure") @ caption{Distribution of response times and pairwise comparisons.} end{center} end{figure} end{document}
After processing the noweb file through the Sweave() function in R and processing the resulting TeX file through a LaTeX compiler, the PDF document in figures D.2 and D.3 is generated.
To learn more about Sweave, visit the Sweave home page (www.stat.uni-muenchen.de/~leisch/Sweave/). An excellent presentation is also provided by Theresa Scott (http://biostat.mc.vanderbilt.edu/TheresaScott). To learn more about LaTeX, check out the article “The Not So Short Introduction to LaTeX 2e,” available on the LaTeX home page (www.latex-project.org).
Sweave provides a means of embedding R code and output in a LaTeX document that’s compiled into a PDF, PostScript, or DVI file. Although beautiful, the final document isn’t editable. Additionally, many recipients require reports in a format such as Word.
odfWeave provides a mechanism for embedding R code and output in OpenOffice documents. Instead of placing R code chunks in a LaTeX document, the user places R code chunks in an OpenOffice ODT file (see figure D.3.). An advantage is that the ODT file can be created with a WYSIWYG editor such as OpenOffice Writer (www.OpenOffice.org); there’s no need to learn a markup language.
Once the noweb document is created as an ODT file, you process it through the odfWeave() function in the odfWeave package. Unlike Sweave, odfWeave has to be downloaded, installed before first use (install.packages("odfWeave")), and loaded in each session in which it will be used. For example,
library(odfWeave) infile <- "example.odt" outfile <- "example-out.odt" odfWeave(infile, outfile)
will take the example.odt file displayed in figure D.4 and produce the example-out. odt file displayed in figure D.5. Adding options(SweaveSyntax="SweaveSyntaxNo web") before the odfWeave() statement may help reduce parsing errors on some platforms.
My Sample Report |
---|
Robert I. Kabacoff, Ph.D. |
<<echo=false, results=hide>>= |
library(multcomp) |
library(xtable) |
attach(cholesterol) |
@ |
1 Results |
Cholesterol reduction was assessed in a study that randomized Sexpr{nrow(cholesterol)} patients to one of Sexpr{length(unique(trt))} treatments. Summary statistics are provided in Table 1. |
Table 1. Descriptive Statistics for each treatment group |
<<echo = false, results = xml>>= |
descTable <- data.frame("Treatment" = sort(unique(trt)), |
“N” = as.vector(table(trt)), |
“Mean” = tapply(response, list(trt), mean, na.rm=TRUE), |
“SD” = tapply(response, list(trt), sd, na.rm=TRUE) |
) |
odfTable(descTable) |
@ |
The analysis of variance is provided Table 2. |
Table 2. Analysis of Variance |
<<echo=false>>= |
fit <- aov(response ~ trt) |
summary(fit) |
@ |
and group differences are plotted in Figure 1. |
<<fig=TRUE,echo=FALSE>>= |
par(mar=c(5,4,6,2)) |
tuk <- glht(fit, linfct=mcp(trt=“Tukey”)) |
plot(cld(tuk, level=.05),col=“lightgrey”,xlab=“Treatment”, ylab=“Response”) |
box("figure") |
@ |
Figure1. Distribution of response times and pair-wise comparisons. |
My Sample Report |
||||
---|---|---|---|---|
Robert I. Kabacoff, Ph.D. |
||||
1 Results | ||||
Cholesterol reduction was assessed in a study that randomized 50 patients to one of 5 treatments. Summary statistics are provided in Table 1. | ||||
Table 1. Descriptive Statistics for each treatment group | ||||
Treatment | N | Mean | SD | |
1time | 1time | 10 | 5.782 | 2.878 |
2times | 2times | 10 | 9.225 | 3.483 |
4times | 4times | 10 | 12.375 | 2.923 |
drugD | drugD | 10 | 15.361 | 3.455 |
drugE | drugE | 10 | 20.948 | 3.345 |
The analysis of variance is provided Table 2. | ||||
Table 2. Analysis of Variance | ||||
Df Sum Sq Mean Sq F value Pr(>F) trt 4 1351.37 337.84 32.433 9.819e-13 *** Residuals 45 468.75 10.42 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 |
||||
and group differences are plotted in Figure 1. |
There are several differences between Sweave and odfWeave:
If you look at Figure D.4, you’ll note that the ANOVA table isn’t attractively formatted (as it was in Sweave). Rather, the table is in the standard monospaced font produced by R. This is because odfWeave doesn’t have a formatting function for the objects returned by lm(), glm(), and so forth. To properly format these results, we’d have to pull the components out of the object in question (fit in this case), and arrange them in a matrix or data frame.
Once you have your report in ODF format, you can continue to edit it, tighten up the formatting, and save the results to an ODT, HTML, DOC, or DOCX file format. To learn more, read the odfWeave manual and vignette.
There are several advantages to the Sweave and odfWeave approaches described here. By embedding the code needed to perform the statistical analyses directly into the final report, you document exactly how the results were calculated. Six months from now, you can easily see what was done. You can also modify the statistical analyses or add new data and immediately regenerate the report with minimum effort. Additionally, you avoid the need to cut and paste and reformat the results.
Unfortunately, you gain these advantages by putting in significantly more work at the front-end. There are other disadvantages as well. In the case of LaTeX, you need to learn a typesetting language. In the case of ODF, you need to use a program like OpenOffice that may not be standard in your work environment.
For good or ill, Microsoft Word and PowerPoint are the current report and presentation standards in the business world. The packages R2wd and R2PPT can be used to dynamically create Word and PowerPoint documents with inserted R output, but they are in their formative stages of development. I’m looking forward to seeing fully developed implementations.