Appendix D. Creating publication-quality output

Research doesn’t end when the last statistical analysis or graph is completed. We need to include the results in a report that effectively communicates these findings to a teacher, supervisor, client, government agency, or journal editor. Although R creates state-of-the-art graphics, its text output is woefully retro—tables of monospaced text with columns lined up using spaces.

There are two common approaches to creating publication quality reports in R: Sweave and odfWeave. The Sweave package allows you to embed R code and output in LaTeX documents, in order to produce high-end typeset reports in PDF, PostScript, and DVI formats. Sweave is an elegant, precise, and highly flexible system, but it requires the author to be conversant with LaTeX coding.

In a similar fashion, the odfWeave package provides a mechanism for embedding R code and output in documents that follow the Open Documents Format (ODF). These reports can be further edited via an ODF word processor, such as OpenOffice Writer, and saved in either ODF or Microsoft Word format. The process is not as flexible as the Sweave approach, but it eliminates the need to learn LaTeX. We’ll look at each approach in turn.

D.1. High-quality typesetting with Sweave (R + LaTeX)

LaTeX is a document preparation system for high-quality typesetting (http://www.latex-project.org) that’s freely available for Windows, Mac, and Linux platforms. An author creates a text document that includes markup code for formatting the content. The document is then processed through a LaTeX compiler, producing a finished document in PDF, PostScript, or DVI format.

The Sweave package allows you to embed R code and output (including graphs) within the LaTeX document. This is a multistep process:

  1. A special document called a noweb file (typically with the extension .Rnw) is created using any text editor. The file contains the written content, LaTeX markup code, and R code chunks. Each R code chunk starts with the delimiter <<>>= and ends with the delimiter @.
  2. The Sweave() function processes the noweb file and generates a LaTeX file. During this step, the R code chunks are processed, and depending on options, replaced with LaTeX-formatted R code and output. This step can be accomplished from within R or from the command line.

Within R, the format is

Sweave("infile.Rnw")

By default, Sweave("example.Rnw") would input the file example.Rnw from the current working directory and output the file example.tex to the same directory. Alternatively, use can use

Sweave("infile.Rnw", syntax="SweaveSyntaxNoweb")

Specifying this syntax option can help avoid some common parsing errors, as well as conflicts with the R2HTML package.

Execution from the command line will depend on the operating system. For example, on a Linux system, this might look like $ R CMD Sweave infile.Rnw

  1. The LaTeX file is then run through a LaTeX compiler, creating a PDF, PostScript, or DVI file. Popular LaTeX compilers include TeX Live for Linux, MacTeX for Mac, and proTeXt for Windows.

The complete process is outlined in figure D.1.

Figure D.1. Process for generating a publication-quality report using Sweave

As indicated earlier, each chunk of R code is surrounded by <<>>= and @. You can add options to each <<>>= delimiter in order to control the processing of the corresponding R code chunk. For example

<<echo=TRUE, results=HIDE>>=
summary(lm(Y~X, data=mydata))
@

would output the code, but not the results, whereas

<<echo=FALSE, fig=TRUE>>=
plot(A)
@

wouldn’t print the code but would include the graph in the output. Common delimiter options are described in table D.1.

Table D.1. Common options for R code chunks

Option

Description

echo Include the code in the output (echo=TRUE) or not (echo=FALSE). The default isTRUE.
eval Use eval=FALSE to keep the code from being evaluated/executed. The default is TRUE.
fig Use fig=TRUE when the output is a graph. The default is FALSE.
results Include R code output (results=verbatim), suppress the output (results=hide), or include the output and assume that it contains LaTeX markup (results=tex). The default is verbatim. Use results=tex when the output is generated by the xtable() function in the xtable package or the latex() function in the Hmisc package.

By default, Sweave will add LaTeX markup code to attractively format data frames, matrices, and vectors. Additionally, R objects can be embedded inline using a Sexpr{} statement. Note that lattice graphs must be embedded in a print() statement to be processed properly.

The xtable() function in the xtable package can be used to format data frames and matrices more precisely. In addition, it can be used to format other R objects, including those produced by lm(), glm(), aov(), table(), ts(), and coxph(). Use method(xtable) to view a comprehensive list. When formatting R output using xtable(), be sure to include the results=tex option in the code chunk delimiter.

It’s easier to see how this all works with an example. Consider the noweb file in listing D.1. This is a reworking of the one-way ANOVA example in section 8.3. LaTeX markup code begins with a backslash (). The exception is Sexpr{}, which is a Sweave addition. R related code is presented in bold italics.

Listing D.1. A sample noweb file (example.nrw)
documentclass[12pt]{article}
	itle{Sample Report}
author{Robert I. Kabacoff, Ph.D.}
date{}
egin{document}
maketitle

<<echo=false, results=hide>>=
library(multcomp)
library(xtable)
attach(cholesterol)
@

section{Results}

Cholesterol reduction was assessed in a study
that randomized Sexpr{nrow(cholesterol)} patients
to one of Sexpr{length(unique(trt))} treatments.
Summary statistics are provided in
Table 
ef{table:descriptives}.

<<echo = false, results = tex>>=
descTable <- data.frame("Treatment" = sort(unique(trt)),
   "N"    = as.vector(table(trt)),
   "Mean" = tapply(response, list(trt), mean, na.rm=TRUE),
   "SD"   = tapply(response, list(trt), sd, na.rm=TRUE)
)
print(xtable(descTable, caption = "Descriptive statistics
for each treatment group", label = "table:descriptives"),
caption.placement = "top", include.rownames = FALSE)
@

The analysis of variance is provided in Table 
ef{table:anova}.

<<echo=false, results=tex>>=
fit <- aov(response ~ trt)
print(xtable(fit, caption = "Analysis of variance",
    label = "table:anova"), caption.placement = "top")
@


oindent and group differences are plotted in Figure 
ef{figure:tukey}.

egin{figure}label{figure:tukey}
egin{center}

<<fig=TRUE,echo=FALSE>>=
par(mar=c(5,4,6,2))
tuk <- glht(fit, linfct=mcp(trt="Tukey"))
plot(cld(tuk, level=.05),col="lightgrey",xlab="Treatment", ylab="Response")
box("figure")
@

caption{Distribution of response times and pairwise comparisons.}
end{center}
end{figure}
end{document}
Figure D.2. Page 1 of the report created from the sample noweb file in listing D.1. The noweb file was processed through the Sweave() function in R and the resulting TeX file was processed through a LaTeX compiler to produce a PDF document.

After processing the noweb file through the Sweave() function in R and processing the resulting TeX file through a LaTeX compiler, the PDF document in figures D.2 and D.3 is generated.

Figure D.3. Page 2 of the report created from the sample noweb file in listing D.1.

To learn more about Sweave, visit the Sweave home page (www.stat.uni-muenchen.de/~leisch/Sweave/). An excellent presentation is also provided by Theresa Scott (http://biostat.mc.vanderbilt.edu/TheresaScott). To learn more about LaTeX, check out the article “The Not So Short Introduction to LaTeX 2e,” available on the LaTeX home page (www.latex-project.org).

D.2. Joining forces with OpenOffice using odfWeave

Sweave provides a means of embedding R code and output in a LaTeX document that’s compiled into a PDF, PostScript, or DVI file. Although beautiful, the final document isn’t editable. Additionally, many recipients require reports in a format such as Word.

odfWeave provides a mechanism for embedding R code and output in OpenOffice documents. Instead of placing R code chunks in a LaTeX document, the user places R code chunks in an OpenOffice ODT file (see figure D.3.). An advantage is that the ODT file can be created with a WYSIWYG editor such as OpenOffice Writer (www.OpenOffice.org); there’s no need to learn a markup language.

Once the noweb document is created as an ODT file, you process it through the odfWeave() function in the odfWeave package. Unlike Sweave, odfWeave has to be downloaded, installed before first use (install.packages("odfWeave")), and loaded in each session in which it will be used. For example,

library(odfWeave)
infile <- "example.odt"
outfile <- "example-out.odt"
odfWeave(infile, outfile)

will take the example.odt file displayed in figure D.4 and produce the example-out. odt file displayed in figure D.5. Adding options(SweaveSyntax="SweaveSyntaxNo web") before the odfWeave() statement may help reduce parsing errors on some platforms.

Figure D.4. Initial noweb file (example.odt) to be processed through odfWeave

My Sample Report

Robert I. Kabacoff, Ph.D.

<<echo=false, results=hide>>=
library(multcomp)
library(xtable)
attach(cholesterol)
@
1 Results
Cholesterol reduction was assessed in a study that randomized Sexpr{nrow(cholesterol)} patients to one of Sexpr{length(unique(trt))} treatments. Summary statistics are provided in Table 1.
Table 1. Descriptive Statistics for each treatment group
<<echo = false, results = xml>>=
descTable <- data.frame("Treatment" = sort(unique(trt)),
“N” = as.vector(table(trt)),
“Mean” = tapply(response, list(trt), mean, na.rm=TRUE),
“SD” = tapply(response, list(trt), sd, na.rm=TRUE)
)
odfTable(descTable)
@
The analysis of variance is provided Table 2.
Table 2. Analysis of Variance
<<echo=false>>=
fit <- aov(response ~ trt)
summary(fit)
@
and group differences are plotted in Figure 1.
<<fig=TRUE,echo=FALSE>>=
par(mar=c(5,4,6,2))
tuk <- glht(fit, linfct=mcp(trt=“Tukey”))
plot(cld(tuk, level=.05),col=“lightgrey”,xlab=“Treatment”, ylab=“Response”)
box("figure")
@
Figure1. Distribution of response times and pair-wise comparisons.
Figure D.5. Final report in ODF format (example-out.odt). Page 2 is similar to the second page of the Sweave output in figure D.2 and is omitted to save space

My Sample Report

Robert I. Kabacoff, Ph.D.

1 Results        
Cholesterol reduction was assessed in a study that randomized 50 patients to one of 5 treatments. Summary statistics are provided in Table 1.
Table 1. Descriptive Statistics for each treatment group
  Treatment N Mean SD
1time 1time 10 5.782 2.878
2times 2times 10 9.225 3.483
4times 4times 10 12.375 2.923
drugD drugD 10 15.361 3.455
drugE drugE 10 20.948 3.345
The analysis of variance is provided Table 2.
Table 2. Analysis of Variance
           Df  Sum Sq Mean Sq F value    Pr(>F)
trt         4 1351.37  337.84  32.433 9.819e-13 ***
Residuals  45  468.75   10.42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and group differences are plotted in Figure 1.

There are several differences between Sweave and odfWeave:

  • The xtable() function doesn’t work with odfWeave. By default, odfWeave will render data frames, matrices, and vectors in an attractive format. Optionally, the odfTable() function can be used to format these objects with a high degree of control.
  • ODF documents use XML markup rather than LaTeX. Therefore, the code chunk option result=tex should never be used. Use result=xml for code chunks that use odfTable().
  • The infile and outfile names should be different. Unlike Sweave, odfWeave("example.odt") would overwrite the noweb document with the final report.

If you look at Figure D.4, you’ll note that the ANOVA table isn’t attractively formatted (as it was in Sweave). Rather, the table is in the standard monospaced font produced by R. This is because odfWeave doesn’t have a formatting function for the objects returned by lm(), glm(), and so forth. To properly format these results, we’d have to pull the components out of the object in question (fit in this case), and arrange them in a matrix or data frame.

Once you have your report in ODF format, you can continue to edit it, tighten up the formatting, and save the results to an ODT, HTML, DOC, or DOCX file format. To learn more, read the odfWeave manual and vignette.

D.3. Comments

There are several advantages to the Sweave and odfWeave approaches described here. By embedding the code needed to perform the statistical analyses directly into the final report, you document exactly how the results were calculated. Six months from now, you can easily see what was done. You can also modify the statistical analyses or add new data and immediately regenerate the report with minimum effort. Additionally, you avoid the need to cut and paste and reformat the results.

Unfortunately, you gain these advantages by putting in significantly more work at the front-end. There are other disadvantages as well. In the case of LaTeX, you need to learn a typesetting language. In the case of ODF, you need to use a program like OpenOffice that may not be standard in your work environment.

For good or ill, Microsoft Word and PowerPoint are the current report and presentation standards in the business world. The packages R2wd and R2PPT can be used to dynamically create Word and PowerPoint documents with inserted R output, but they are in their formative stages of development. I’m looking forward to seeing fully developed implementations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset