16

Other Tools

Besides knitr, there is a large number of other tools for dynamic documents. Some are R packages, and others are tools in other languages such as Python and awk. We give a brief overview of these tools with comparisons to knitr in this chapter, and we especially explain the differences between Sweave and knitr for Sweave users.

16.1    Sweave

The knitr package was largely motivated by Sweave (Leisch, 2002), which has been a longstanding prominent tool for dynamic documents in R, and is a part of base R (in the utils package as the Sweave() function). Sweave primarily deals with Rnw documents, although it also has a modular design that allows it to be extended to other document formats. A number of extensions based on Sweave exist on CRAN, and we will introduce them in the next section.

There are two ways to run Sweave. We can call it in an interactive R session (you do not need to load the utils package):

Image

In addition, we can use the command line, too:

Image

Since Sweave is part of base R, its development has almost plateaued in recent years. Another major problem is that its modular design is not modular enough, so its extensions may become incompatible as Sweave gets updated in base R. As far as we know, a few R packages based on Sweave copied a large amount of core code from Sweave, and are no longer synchronized with the development of Sweave.

A lot of knitr’s chunk options were borrowed from Sweave, such as eval, echo, results and so on, but the design is different, so there are several differences between them. Before version 1.0, knitr tried to be compatible with Sweave — knitr was able to compile Sweave documents because of some internal functions to fix the differences automatically. The compatibility has been dropped since v1.0, with a conversion function Sweave2knitr() provided to convert Sweave documents to knitr manually. Below is an example of converting the Rnw document in the utils package and showing the differences after conversion (< shows the original document, and > shows the converted file):

Image

Image

16.1.1  Syntax

By default, knitr uses a new type of syntax to parse chunk options, which is similar to R function arguments. This gives us much more power than the traditional Sweave syntax. We can use arbitrary objects in chunk options and make use of the full power of R.

Sweave treats chunk options as character strings and parses them by splitting the options by commas, whereas knitr uses the R syntax: if the option takes a character value, we have to quote it just like we do in R, e.g., results = ’hide’ (in Sweave we write results = hide). See Section 12.1.3 for an example of doing computing directly in chunk options. Below is another example, which shows how flexible the new syntax is (we can dynamically create a figure caption):

Image

The other minor difference in syntax is that knitr does not recognize @ as the beginning of text chunks unless there is a chunk header before it. For example, knitr will keep the first @ in the example below but Sweave will remove it:

Image

Sweave2knitr() can fix this problem automatically.

16.1.2  Options

Some options of Sweave were dropped in knitr and some were changed, including:

concordance was changed mainly to support RStudio; if the package option opts_knit$get(’concordance’) is TRUE, a file named input-concordance.tex will be written with output line numbers mapped to input line numbers; note the implementation is less accurate than Sweave

keep.source was merged into a more flexible option tidy

print was dropped: whether an R expression is going to be printed is consistent with your experience of using R (e.g., x <- 1 will not be printed, while 1:10 will; just imagine you are typing the commands in an R console); if you really want the output of an expression to be invisible, you may use the function invisible()

term was dropped (think term = TRUE)

prefix was dropped (think prefix = TRUE)

prefix.string was renamed fig.path and it is always used for figure filenames

eps, pdf and all logical options for graphics devices were dropped: use the new option dev instead, which is similar to grdevice in Sweave but has more than 20 predefined graphical devices; see Chapter 7

fig was dropped; now use fig.keep: fig.keep = ’high’ in knitr is equivalent to fig = TRUE and fig.keep = ’none’ is the same as fig = FALSE in Sweave

width, height were renamed fig.width and fig.height, respectively

Meanwhile, SweaveOpts{} and SweaveInput{} are deprecated; use opts_chunk$set() and the chunk option child to set global chunk options and include child documents, respectively.

For logical options, only TRUE/FALSE/T/F are supported (the first two are recommended), and true/false will not work; e.g., eval = FALSE is OK, and eval = false is not (unless there is an R object named false that happens to take a logical value FALSE). Chunk reference using the <<label>> syntax is still available, and there are other approaches for reusing chunks, e.g., use the new option ref.label; chunk references can be recursive, as introduced in Chapter 9.

16.1.3  Problems

Some known problems and frequently asked questions in Sweave have been solved in knitr:

•  empty figure chunks give Image errors in Sweave but not in knitr because figures will not be generated at all; knitr writes figures to Image only when there are plots in a chunk

•  lattice (and ggplot2) graphics do not work in Sweave if you do not explicitly print() them, and they work in knitr just like in R console (if these plot objects appear in the top environment, you do not need to print them)

•  the width of figures in the output is set to .8textwidth in Sweave by default via setkeys{Gin}{width=.8 extwidth} defined in the Image style Sweave.sty; this affects all figures in the document regardless of whether they are generated by Sweave, and there is no straightforward way to set individual widths for figures; this problem has been solved by the out.width option in knitr

•  multiple figures from one figure chunk do not work by default in Sweave and you have to write Image code by yourself in this case; for knitr, it does not make any difference no matter how many plots there are in one chunk

•  it is possible to use output hooks to change the formatting of output in knitr, and we do not have to use hard-coded Image environments such as Sinput/Soutput in Sweave; in fact, we can call render_sweave() to render the Sweave style from knitr

•  it is easy to produce HTML output with knitr (with either R HTML or R Markdown), and Sweave needs extensions such as R2HTML, which only deals with HTML

Sometimes we see a stray Rplots.pdf file after we run Sweave, and that is because R’s default graphical device is pdf() for non-interactive R sessions, which creates Rplots.pdf. In knitr, the default device is set to a null device (pdf(file = NULL)) so that no stray PDF files will be generated.

16.2    Other R Packages

Most features in Sweave and the R packages introduced below (except R2HTML) are covered by knitr, so this section is mainly for historical interest.

The highlight package (Francois, 2013) provides syntax highlighting for R code in Rnw documents. Like pgfSweave, cacheSweave, and R2HTML below, highlight was extended based on Sweave. In early versions (before v0.6), knitr depended on highlight to do syntax highlighting, but this dependency was removed later due to maintenance problems and the fact that it has additional dependencies (the Rcpp and the parser package). Now knitr uses its own syntax highlighting functions, which were based on regular expressions before R 3.0.0 and rely on the function getParseData() in the utils package in base R after R 3.0.0. To achieve similar functionality as highlight, we just need to use the chunk option highlight = TRUE in knitr.

The cacheSweave package (Peng, 2012) added an important feature to Sweave: the cache system; the weaver package (Falcon, 2013) did a similar thing with a different implementation. Chunk options cache and dependson were added, having the same meaning as in knitr (see Chapter 8).

The pgfSweave package (Bracken and Sharpsteen, 2012) combined the features of highlight and cacheSweave, and added further support for graphics. Specifically, plots can be cached as well, and TikZ graphics via the tikzDevice package are also supported for the sake of font style consistency. The author of this book switched to pgfSweave from Sweave when it came out, and contributed the formatR support to it (the tidy option), but as time went by, it became more and more difficult to keep up with changes in Sweave. This package has been removed from the CRAN repository. At any rate, the design of knitr benefited a lot from the author’s experience with pgfSweave.

The brew package (Horner, 2011) is a light-weight templating framework, and its syntax is similar to PHP (<?php ?>). Basically it parses and executes R code inside the templating tag <% %>. You can think of this as the inline R code in Sweave and knitr. It has a cache system but does not have direct graphics support. The knitr package also has partial support for the brew syntax, which we did not mention in Chapter 5; below is an example that can be compiled through knitr:

Image

If an input file has an extension *.brew, knitr will use the brew syntax automatically. Note brew actually supports incomplete code fragments in several inline expressions, which makes it really similar to PHP. Here is an example taken from brew but knitr will not be able to compile it:

Image

The R2HTML package (Lecoutre, 2014) contains a large number of functions to export R objects to HTML. The main function is an S3 generic function HTML(), which can be applied to a variety of R objects such as data frames, tables, lm objects (returned by lm()) and so on. Below is a subset of the iris data converted to an HTML table:

Image

Image

We can make use of R2HTML inside knitr for R HTML documents, with the chunk option results = ’asis’ to write raw HTML code into the output.

The other major contribution of R2HTML is the Sweave extension, which allows one to write an HTML report based on Sweave.

There is a task view on CRAN about reproducible research: http://cran.r-project.org/web/views/ReproducibleResearch.html, where we can find more packages on this topic.

16.3    Python Packages

In this section we introduce three packages based on Python for dynamic documents: Dexy, PythonImage, and IPython.

16.3.1  Dexy

Dexy (http://www.dexy.it) is a free Python package that features a very general design. According to its website:

Dexy is a free-form literate documentation tool for writing any kind of technical document incorporating code. Dexy helps you write correct documents, and to easily maintain them over time as your code changes.

The four major features are:

1.  any language (source code)

2.  any markup (output)

3.  any template

4.  any API (programming)

There are apparently some similarities between Dexy and knitr, such as the multi-language support. An important concept of Dexy is the “filter”: the filter takes an input file and converts it to an output file, which is similar to the pipe | in shell scripts. The filters in Dexy are actually a combination of concepts in knitr: a filter may render output (e.g., from Markdown to HTML), or run a programming language (like language engines in knitr), or do additional tasks like knitr’s chunk hooks.

Normally Dexy separates computer code from templates, which can be either good or bad. The good aspect is that the source scripts can be reused, and the bad thing is we have to jump back and forth between the report environment and the source code. By default knitr directly embeds code chunks in a report, but we can also externalize code chunks as introduced in Chapter 9.

16.3.2  PythonImage

PythonImage (https://github.com/gpoore/pythontex) is a Image package, which features execution of Python code within Image. According to its documentation:

PythonImage provides fast, user-friendly access to Python from within Image. It allows Python code entered within a Image document to be executed, and the results to be included within the original document. It also provides syntax highlighting for code within Image documents via the Pygments package. We can insert inline Python code using the pyb{} command, or emulate a Python session in Image using the pyconsole environment, e.g.,

Image

When we compile this document, the Python code will be evaluated and the results will be inserted into the output.

Due to its Python origin, it also has integration with other Python packages such as SymPy (symbolic manipulation) and matplotlib (plots).

16.3.3  IPython

IPython (http://ipython.org) is an interactive shell for Python that features a Web-based notebook with support for code, text, mathematical expressions, inline plots and other rich media, high performance tools for parallel computing, and so on.

Figure 16.1 is a screenshot of IPython in a GNOME terminal under Ubuntu. We can see that it has basic functionalities of a shell such as the auto-completion of commands: we type x.spl<TAB> in the shell and will see the auto-completion below.

The most notable feature related to report generation is its Web-based notebook: we can work in the Web browser with Python commands, view the results on the fly (including both numerical and graphical results), and the notebook can be continuously updated as we input more content into the notebook. It is very much like writing code chunks in knitr.

An IPython notebook can be saved as a JSON file with the extension *.ipynb, which can be shared with others. The notebook may or may not contain output; a notebook without the output is similar to the source document for knitr (e.g., Rnw and Rmd documents).

Inspired by IPython, knitr has got a similar Web notebook (but with fewer features), which we have mentioned in Section 3.2.2.

Image

FIGURE 16.1: A screenshot of IPython: input is marked as In[n], and output is marked as Out[n].

16.4    More Tools

In addition to R and Python packages, there are tools in other programs. It is impossible to enumerate all the tools for dynamic documents in this chapter. Schulte et al. (2012) have provided a list of existing tools for literate programming and reproducible research, such as Javadoc, cweb, noweb, Sweave, SASweave, and so on.

16.4.1  Org-mode

Org-mode is a plain text markup language, with an implementation in the Emacs text editor (Schulte et al., 2012). It supports both literate programming and reproducible research (in the sense of dynamic documents). It more or less follows the syntax of early implementations of literate programming such as WEB and noweb, i.e., it has the concept of code chunks and text chunks (the text chunks are sometimes called “prose”). A code chunk in Org-mode looks like this:

Image

By comparison, the same chunk is written like this in knitr:

Image

The metadata is stored in the chunk headers. Org-mode supports any input languages, with either Image or HTML as the output format.

Schulte et al. (2012) mentioned the capability of literate programming of existing tools (e.g., Sweave does not have it), which we did not emphasize in this book because it does not sound interesting to report writers. As a matter of fact, knitr also has this capability of reorganizing code chunks (see Chapter 9). Below is a simple example of defining chunk B later but embedding it in an earlier chunk A:

Image

Image

Powerful as it is, the Emacs nature of Org-mode may be an obstacle to beginners.

16.4.2  SASweave

SASweave(http://homepage.cs.uiowa.edu/~rlenth/SASweave) is an implementation of literate programming with SAS and R. It was written in gawk. The basic idea is the same as Sweave and knitr. See Lenth and Højsgaard (2007) for more information. The knitr package has more comprehensive support for R but less support for SAS compared to SASweave.

16.4.3  Office

We do not have to choose the plain text format for dynamic documents, whereas almost everything we have introduced in this book is based on plain text. There are tools based on OpenOffice (or OpenDocument Text) or Microsoft Office products (we call them Office documents for short), and they may seem appealing at first glance. At its core, an Office document is usually an XML file (which may be compressed), so it is possible to embed code chunks in it. We can parse code chunks, run them, and insert the results back.

The major problem we see is that the XML format is too complicated and there are too many standards, so it is not trivial to make sure the modified document is still a valid Office document. As one example, the StatWeave package (http://homepage.stat.uiowa.edu/~rlenth/StatWeave/) no longer works with OpenOffice (3.2 and higher) because “OpenOffice flags the modified document as corrupted.”

By comparison, plain text files are much easier to deal with; there are no complicated standards such as ECMA-376 to take care of. If we want Office documents at all, there are at least possibilities of conversion from Markdown. Recall what we quoted in Chapter 1:

The source code is real.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset