In this chapter we show some tricks that can be useful for writing and compiling reports more easily and quickly, and also solutions to frequently asked questions.
There are a number of built-in chunk options in knitr, and we usually assign values to them in chunk headers, but it is still possible to customize these fixed options, e.g., rename the options.
We may feel some options are very frequently used but the names are too long to type. In this case we can set up aliases for chunk options using the function set_alias() in the beginning of a document, e.g.,
Then we will be able use w
and h
for the figure width and height, respectively, e.g.,
The chunk above is equivalent to:
Besides option names, we can also bundle frequently used option values together as option templates. The object opts_template
in knitr can be used to build such templates. A template is a named collection of option sets. For example, if there are a large number of plots for which we want to set the graphical device size to be 7 × 5 inches, and for other plots, we want the size to be 3.5 × 3 inches. We can certainly type fig.width = 7, fig.height = 5
for the first group of plots, and fig.width = 3.5, fig.height = 3
for the second group, but this is apparently tedious (even with option aliases). In this case we can just put the two sets of options in templates:
After the templates have been set up, we can simply use the chunk option opts.label
in future chunk headers to reference to them. For instance, we want the options for large plots in the chunk below:
This is equivalent to:
Since chunk options can take arbitrary R expressions, we can program chunk options besides setting fixed values like numbers or logical values. We show below an example of drawing a table with the gridExtra
package. First we use the tableGrob() function to create a table Grob (graphical object):
Next, we use grid.draw() in the grid package to draw the object to a plot. Prior to that, we need to determine an appropriate size for the graphical device; otherwise we might get extra white margins in the plot. In fact, the convertWidth() and convertHeight() functions in the grid package can convert the pre-calculated width and height of the Grob to inches. Therefore, we pass two function calls to the chunk options fig.width
and fig.height
instead of using fixed numbers as we usually do. Figure 12.1 is a table of the first four lines of the iris
data drawn by grid.draw().
The programmable chunk options enable us to program our reports in many aspects. As one potential application, we may build a linear regression report including common diagnostic procedures, with each procedure in a child document (Section 9.3). Then we can decide whether to include certain procedures based on certain conditions, e.g., if we have detected outliers in the regression model, we include an outlier module to deal with outliers. The chunk below shows a sketch of this idea:
Sometimes we do not want to show the code chunks in the body of the report, but we do not want to completely hide the code, either. In this case we can move all code chunks to the appendix, and the chunk option ref.label
can be useful here (Section 9.1.2).
If there are only a small number of code chunks in the document, we can manually type their labels, e.g.,
Here we hide the code in the previous chunks by echo = FALSE
, and gather them into the last chunk by ref.label
. Note the last chunk used the chunk option eval = FALSE
so that the code is not evaluated again.
If there are a lot of code chunks in a document, we can use the function all_labels() in knitr to obtain all chunk labels in a document, and pass them to ref.label
, e.g.,
We can set echo = FALSE
globally by opts_chunk$set()
, and use echo = TRUE for the last chunk to show the code there. Of course we can also select chunk labels to include there, e.g., remove the first chunk by all_labels()
[-1].
The chunk option R.options
can take a list of R options to be passed to options() for a code chunk. These options will be applied to the code chunk, and restored after the chunk, so it can be useful if you want to temporarily change R options for a particular code chunk.
For example, we use local options width = 30
(the approximate width for printing) and digits = 2
(the number of digits for printing) for the following code chunk:
Usually we just type the code in a chunk, or include code from other chunks by references (Chapter 9). There is yet another way to assign code to a chunk, using the chunk option named code
. This makes it possible to construct a code chunk dynamically. For example, you can read the code from an external script:
Although we did not specifically mention it before, there is an object named opts_knit
in knitr that controls some package-level options, and its usage is the same as chunk options (opts_chunk
).
By default we see a progress bar when we call knitr, and we can suppress it by setting opts_knit$set(progress = FALSE)
. The progress bar shows the progress of knit() so we know which chunk is currently being compiled if it takes a relatively long time. To see more information about chunks such as the source code, we can turn on the verbose mode by opts_knit$set(verbose = TRUE)
.
The package option root.dir
can be used to set the root working directory when evaluating code chunks. The default working directory is the directory of the input document, but we can change it with this option, e.g., after we set
Then we can read a data file under that directory without using the full path, but in general, we recommend putting datasets and source documents in the same directory, and use this directory as the working directory.
For the chunks that are not labeled, automatic labels of the form unnamed-chunk- i
will be used. This can be customized via the package option unnamed.chunk.label
, e.g.,
Then the automatic chunk labels will be fig-1, fig-2
, and so on.
In this section we show some solutions to tweaking the typesetting of a report.
A common problem of using knitr in is that the output width may exceed the page margin. There are three types of widths: the width of the source code, the text output, and the graphics output. In Section 7.4 we mentioned maxwidth
, which guarantees the graphics output will not be wider than the page width.
For the width of source code and text output, it is controlled by the global option width
in options() (Section 6.2.2). The default value for this option is 75, which may be too large for documents unless we have reset the page margins (e.g., using the geometry package).
When we see the source code or the text output is too wide, we can use a smaller width
option, e.g.,
However, this may not work all the time: for the source code, R may not be able to find an appropriate place to break the source lines; for text output, the original lines may not contain line breaks (because they are in the verbatim environments, will not break the lines automatically). For the example below, the text lines will not be wrapped no matter how small the width
option is:
This is an extreme example. Normally our source code can be formatted into several lines. If we have a character string that is too long in the source code, we can consider breaking it into smaller pieces manually and pasting them together with paste(), e.g.,
An alternative approach is to use the listings style (recall Figure 5.2 and the function render_listings()). We can set the breaklines
option to true for the listings package in the preamble:
lstset{breaklines=true}
See Figure 12.2 for an example of this option in .
For output, there are three colors defined, corresponding to messages, warnings, and errors, respectively:
definecolor{messagecolor}{rgb}{0, 0, 0}
definecolor{warningcolor}{rgb}{1, 0, 1}
definecolor{errorcolor}{rgb}{1, 0, 0}
By default messages are black, warnings are magenta, and errors are red. We can redefine them using the command definecolor{}
in the preamble.
As we introduced in Section 6.2.3, the default style of knitr is based on the framed package, and that is why we see shaded boxes underneath all code chunks. If we feel the default padding of the box is too tight, we can reset the length of fboxsep{}
by setlength
, e.g.,
setlengthfboxsep{5mm}
Now we see the gray box is larger, with a padding space of 5 mm. For HTML output, it is much easier to design the style, e.g., we can define the class chunk in CSS as this to make the padding 5 mm:
div.chunk {
padding: 5mm;
}
Beamer (Tantau et al., 2012) is a popular document class to create slides with . Using knitr in beamer slides is not very different from other documents; the only thing to keep in mind is that we need to specify the fragile
option on beamer frames when we have verbatim output. See Figure 12.3 for the Rnw source of a simple beamer example, with one page of the output in Figure 12.4.
Due to the limited space in beamer slides, it may be desirable to use smaller font sizes for the code. In this case we can set a global chunk option size
, e.g.,
Next we show an example of programming the content of output, which makes it possible to use the beamer command only{}
to show plots one by one in the same place on the screen (for more information, see the beamer manual). The basic idea is to replace the graphics command includegraphics{}
by only<n>{includegraphics{}}
, with n being the n-th plot in the current chunk. Below is a modified plot hook that does this job:
One key here is the option fig.cur
, which is an internal chunk option (not specified by users) providing the current figure number. The substitution of includegraphics{}
was done through regular expressions. After we have modified the plot hook, the plot commands in output will be changed accordingly.
For those who have read the book “Modern Applied Statistics with S” (MASS) by Venables and Ripley (2002), you may have noticed that the authors omitted parts of the output in the book in several places, because the output will otherwise be too long. For example, the data frame painters
on page 17 has 54 rows, but only the first 5 rows were shown on that page, and the rest of the rows were omitted (the omission was denoted by ….). We can automate this job by redefining the output hook in knitr (Section 5.3), e.g.,
Then we can achieve a similar effect of the example in the MASS book:
The basic idea of the hook defined above is, if the number of lines of the output is greater than 5, we extract the first 5 lines by head(x, 5)
, and append …. to the output vector, then pass the modified output to the default output hook function hook_output(), which was obtained before we reset the output hook. We do not have to hard-code the number of lines to be 5, so we also check if the chunk option out.lines
is NULL
; if it is not, it is supposed to be a number to specify the number of lines to keep in the output. For example, we print the first 10 lines instead:
Note this hook applies to all document formats (Rnw and Rmd, etc.), because we do not have any document-specific code in the new definition; for different document formats, knit_hooks$get(’output’)
will be different as well, hence the new hook is portable.
12.3.6 Escape Special Characters
As introduced in Section 5.3, the inline hook function is used to write inline results into the output. By default, it writes characters as is, and sometimes we may want to escape special characters in or HTML, e.g., an inline R code fragment produces a percentage 30%, and we have to write %
as \%
in , otherwise it means comments.
It is unclear whether we should escape special characters or not, e.g., we may generate a equation from inline R code, in which case we must not escape special characters such as backslashes. Anyway, if we do want to escape them, we can create a new inline
hook function, e.g.,
An internal function escape_latex() was used to escape special characters, and the escaped text strings will be passed to the default inline hook. We only added one step before the default hook function, and all features of the default hook will be preserved, such as automatic scientific notation (Section 6.1).
Similarly, if we are writing an R HTML document instead, we can call the escape_html() function.
12.3.7 The Example Environment
When writting textbooks or tutorials, it can be useful if we number the R code chunks like theorems and equations. It is easy to define an “Example” environment in the preamble, e.g., using the amsthm package:
usepackage{amsthm}
ewtheorem{rexample}{R Example}[section]
Then we can use this new environment rexample in our document:
egin{rexample}
<<test, eval=TRUE>>=
1 + 1
rnorm(10)
@
end{rexample}
In fact, we can automate this job with a chunk hook function, so that we do not have to type the environment again and again. The rexample
hook below writes the environment automatically for a chunk with a non-NULL
chunk option rexample:
Basically this hook writes egin{rexample}
before a chunk, and end{rexample}
after it. Additionally, it writes a label for the environment so that we can reference it later, and the label is the chunk label. Now we can apply it to a chunk, e.g.,
Figure 12.5 shows a sample page that used this hook function. We can see the R code chunks are numbered after the section numbers, which is due to the [section]
option in the definition of the rexample
environment. Because the rexample
environments also come with labels, we can use
ef{}
for cross references.
It is also possible to create a similar hook for R HTML documents, but since HTML is not primarily for typesetting purposes, it is not easy to get the automatic numbering as in . Anyway, we can use our own counter in R, e.g.,
Besides documents, you can also use typeset HTML documents. There is a function rocco() in knitr that provids a two-column layout for HTML documents. This style was borrowed from a literate programming package named Docco (https://github.com/jashkenas/docco
). The narratives and code are arranged in separate columns, so that you can keep on reading either the narratives or the code in one column. You can hide either column with a keyboard shortcut. Figure 12.6 is a screenshot of a package vignette in knitr that uses this style:
There are a few utility functions in knitr to complete miscellaneous tasks such as writing Bib databases for R packages, base64 encoding images for HTML output, and compiling source documents to the final output.
The function write_bib() is a wrapper to the functions citation() and to- Bibtex() in base R. By default it collects the packages loaded into the current R session and extracts their citation information. It also has an argument named tweak
, which determines whether to tweak the default citation information, e.g., the author name “Duncan Temple Lang” should be “Duncan {Temple Lang}” in the bibliography database. Instead of manually modifying information like this, write_bib() can automatically deal with it.
The second argument of write_bib() is file
, and we can pass a filename to it to save the bibliography items into a file. By default, it writes to the standard output.
The advantage of generating the bibliography database using this function is that we can guarantee we always cite the package versions that we really use in a document. If we hard-code the bibliography, the citations may be out-of-date after we update R packages.
If we do not want to write the file each time we compile the document, we can cache the chunk. Then a natural question is, when should we, or how can we update the cache? Recall Chapter 8 and one solution is to put the package version(s) in a chunk option, e.g., if the main package that we use for a document is called foo, we can write a chunk like this:
Then whenever the foo package is updated, the cached chunk will be updated accordingly.
It is convenient to publish a PDF report because a PDF document contains everything in one file, including plots in particular, but that is not true for HTML reports. If an HTML page contains images that are external files, we have to publish these images along with the HTML file, otherwise the Web browser will not be able to find them. There is a technology called “Data URI” in Web pages that solves this problem. In short, we can encode a file into a character (base64) string and include it in HTML, so that we do not need the original file any more when publishing the HTML page. In other words, the HTML page is self-contained just like PDF.
The function image_uri() in knitr was designed to encode images as base64 strings. Obviously it only applies to HTML output (including Markdown). We can enable this function in opts_knit
:
Then if we have plots in HTML output, the image file paths will be replaced by base64 character strings. Below is an example of encoding the R logo (a JPEG image):
Based on the same reason, we designed another function imgur_upload() to upload images to the website Imgur.com, and this function returns the URL of the uploaded image. Then, instead of using the image file path to reference the image (which has the problem mentioned before), we use a URL that is accessible anywhere as long as we have Internet connection. To continue the previous example, we can upload the R logo to Imgur website by:
This returns a URL of the form http://i.imgur.com/xxxxx.jpg
. To make things even easier, we can set the package option upload.fun
like we did in the last section:
Then images will be automatically uploaded to Imgur when we knit a document. To avoid repeated uploading of the same image, we can turn on cache.
For some document formats, there are two steps in compilation. For example, Rnw documents are compiled through knitr to documents, which need to be compiled to PDF via . For Rmd documents, the final product is often HTML instead of Markdown, which is the direct output of knitr.
To turn the two steps into one, the functions knit2pdf() and knit2html() can be used. The former will first knit() an Rnw document to a document, and then call texi2pdf() in base R to compile it to PDF; the latter will knit() an Rmd document to a Markdown document, and call markdownToHTML() in the markdown package to compile Markdown to HTML.
For users under Unix-like systems, there is a Bash script named knit under the directory bin of knitr’s installation path; we can find it via:
It is an executable script that calls R to load knitr and automatically uses knit2pdf() or knit2html() based on the filename extension; if we put this script in the PATH
variable, we can call it in command line directly. For example, I have made a symbolic link under ~/bin/
to this script, and added this to ~/.bashrc
:
Then we can run knit
like other programs in the terminal without having to start R and type all the commands there.
So far we have been using files as the input for the knit() function in knitr. As a matter of fact, there is an alternative argument to receive the source document, which is named text
.
If we provide an input file to knit(), it will be read into knitr and assigned to the text
argument eventually. The content of files is usually fixed, but for the text
argument, we can dynamically construct it using R since it is nothing but a character variable.
Now we show a comprehensive example, which builds a PDF document for all the geom examples in the ggplot2 package; see the source code in Figure 12.7 and a sample page of the output in Figure 12.8. It may look a little bit complicated at first glance, but the basic idea is simple:
1. in the setup
chunk, we set two global chunk options: tidy = FALSE
(optional) and cache = TRUE
(because there are a large number of example code chunks to run later);
2. in the write-examples
chunk, we use apropos() to find all function names that start with geom
_; then we find their help files and from there extract the examples code with Rd2ex() in the tools package; finally we construct Rnw chunks using the function names as section titles and chunk labels, and assign the source text to a variable ex;
3. in the last step, we knit the source passed from the text argument and knit() returns the code, which we insert into the document as a text string by Sexpr{}
;
This source document will produce a PDF document of more than 200 pages, taking a few minutes on the first run. Note that it uses the document class tufte-handout
, which is a class you may have to install (it is not a standard class that comes by default).
We mentioned the function purl() briefly in Section 3.4. Actually it has an additional argument named documentation
, which controls the level of details of documentation chunks.
The documentation
argument takes three possible values:
0L discard all text chunks, including chunk headers, so the output is pure program code
1L discard text chunks but preserve chunk headers in the exported code file
2L keep everything in the source document but put text chunks in roxygen comments (i.e., after #’
)
The following chunk shows examples corresponding to three values of the documentation
argument. Note that the chunk headers are written after ##
----, and text chunks are after #’
. When documentation = 2
, the generated R script can be passed to the function spin() to restore the original document (Section 5.4).
For code chunks that have the chunk option purl = FALSE
, their code will be ignored. For those chunks that have eval = FALSE
, their code will be commented out.
12.4.7 Reproducible Simulation
As we discussed in Chapter 8, it is not trivial to write a report that can be easily and completely reproducible for others. One challenge is to make random simulations reproducible. Of course we can use set.seed() to fix the random seed, but what if we have enabled cache?
The problem is, when should we update a cached chunk that involves random numbers? One sufficient condition is the change of the random seed, i.e., if the random seed has changed before a chunk, this chunk should be re-evaluated.
The object rand_seed
in knitr was designed for this purpose. This object is essentially an unevaluated expression:
Basically it returns the random seed if it exists. We can assign this object to a chunk option; because it is an unevaluated expression, each time a chunk is compiled, this object will be evaluated again (knitr will always evaluate unevaluated chunk options). Then if the random seed has changed, knitr will be able to detect the change and update the cached chunk accordingly. Below is an example:
Even if we only switched the positions of two cached chunks (with the code and options untouched), the cache will be invalidated because the evaluated results of rand_seed
will be different for these two chunks compared to the last run.
R has a standard documentation system, and one thing that can be improved is the examples in the help pages — we can actually run these examples and put the results in the pages, so that it is easier for the reader to know the results without having to copy and paste code from the documentation.
The function knit_rd() was designed for this task: it takes a package name and extracts all its HTML help pages, then compiles all the examples. This can be handy for package authors, because it generates HTML files that can be published on the Web, and they are richer than the default R documentation. For example, we recompile all the help pages of the rpart package:
We will see a few HTML files under the current working directory. If there are plots in the examples, they will be base64 encoded and embedded in the pages, so we do not need to take care of additional files — just upload all these HTML files to a website.
Rst2pdf (http://rst2pdf.ralsina.com.ar
) is a free software package to create PDF from reStructuredText
. If we write the source document in the R reST format (Section 5.2.4), the output from knitr is a *.rst
document, and we can call Rst2pdf (if installed) to convert it to PDF via the wrapper function rst2pdf() in knitr, or just call knit2pdf(’foo.Rrst’)
in one step.
Some R packages contain demos, which can be run by the demo() function, e.g.,
We can insert demos into a source document using the read_demo() function in knitr, which is simply a wrapper of read_chunk() as introduced in Section 9.2.2.
Figure 12.9 shows a complete example of including the flowchart
demo of the diagram
package into an Rnw document; see Figure 12.10 for a sample page of the output. We can certainly use a simple chunk of one line of code demo(’flowchart’, echo = TRUE)
instead, but we will lose syntax highlighting.
When we want to see the source code of an R function, we can simply type its name and R will print its source code, e.g.,
But since knitr supports syntax highlighting and code reformatting (Sections 6.2.2 and 6.2.3), we may also want to use these features on the function source. The only question is how to get the source code into knitr, and one answer could be read_chunk() again. We define a function insert_fun() below to assign the (dumped) source code of an R object to a chunk:
For an object name
, its dumped representation will be captured in a code chunk of the label name-source
(see ?dump
and ?capture.output
for details). Now we can use this function to insert the source code of any functions into the source document, e.g., the fivenum() function:
Then we only need to use the chunk label fivenum-source
to show the (highlighted and reformatted) source code:
The source code of the above chunk is:
The function knit_expand() was designed to pre-process a source document, which is often a template file for creating repeated text with some changing parameters. For example, we may want to build regression models for the same response variable against different independent variables, and all the models are more or less the same form; all we need to change is the variable names in the models. For example, linear regressions of mpg
against two variables in the mtcars
data:
The basic idea of knit_expand() is to insert some tags in a template, and dynamically evaluate them in the current environment. Below are a few simple examples:
As we can see above, the R expressions in {{}}
are evaluated and their values are written in the output.
We can dynamically create the source document for knit() based on knit_expand() like the example in Section 12.4.5. As an example, we build the linear regression models of mpg against all combinations of two variables in the mtcars
data, with each model in one section. We write a template file as shown in Figure 12.11 and name it mtcars-template.Rnw
. Then we can build our models based on this template:
We used the function combn() to get all combinations of two variables, and passed them to knit_expand() via mapply(). The next step is straightforward: pass the pre-processed source text src
to knit(), e.g., knit(text = src, output = ’lm-mtcars.tex’)
, and we will get the output with the regression results.
Sometimes you may not want to knit the whole document, and the function knit_exit() allows you to quit early. Once you put it in a code chunk, the rest of the document will be ignored, and the results from all previous text/code chunks will be returned immediately.
12.4.14 Literal knitr Source Code
You may find it a difficult task when you want to write literal knitr source code, such as the source code of an inline R expression, e.g., Sexpr{x}
. This is a common task especially when you write knitr tutorials. You certainly cannot write the source code as-is, because knitr will evaluate it. You cannot even write verb|Sexpr{x}|
, since knitr does not understand the special meaning of the command verb||
. Similarly, it may be difficult to write a literal inline expression `r x`
in R Markdown.
The function inline_expr() in knitr provides one solution to this problem. It takes a character string, and wraps it using the appropriate syntax of inline expressions.
Then you can call this function in an inline expression. For example, verb|Sexpr{inline_expr(’1 + 1’)}|
in Rnw documents, or ```r inline_expr{'1 + 1'}```
in Rmd documents.
Another solution is to mutate certain characters in the inline expression, e.g., instead of Sexpr{}
, you can write extbackslash{}Sexpr{}
in , since the latter will not be recognized as an inline expression.
There is a similar challenge for writing literal code chunks. Again, you just need to change the source code of the code chunk so that it is no longer recognizable by knitr. For example, you can add an inline expression with an empty character string before the chunk header, such as Sexpr{'}<<>>=, or `r''````{r}
. Such lines will not be treated as valid chunk headers, because knitr’s syntax only allows white spaces before the chunk header.
Base R has a spell check function aspell() in the utils package, which can perform spell check via Aspell, Hunspell, or Ispell. To check the spelling of knitr documents, you may want to skip code chunks, because program code often contains words that are considered as misspelled.
The aspell() function can take a filter function to skip certain lines in the files. The function knit_filter() was designed to skip code chunks in a file. Here are two examples of checking an Rnw and Rmd file, respectively:
You can add words that you know are correctly spelled to a dictionary, so the spell checker does not report them the next time. R has a built-in dictionary, which contains the word “”. Once we apply this dictionary, you will see the word “” is no longer reported (but “knitr” still is):
Although there is no hard requirement on whether to run knitr in an interactive or non-interactive R session, it is recommended to use a new non-interactive R session because it is less likely to be “polluted” by existing objects in the R workspace. Based on this consideration, some editors such as RStudio open a new R session to compile reports by default.
The problem with non-interactive R sessions is that debugging may be inconvenient. If an error occurs, knitr will quit from R with a message printed on screen showing the problematic chunk, including its label and line numbers.
If the information mentioned above is not enough, we can also open an interactive R session and run knit() there. When an error occurs in this case, we can use common debugging tools such as traceback() (to see the call stacks that led to the error), or debug(), or browser().
If the source document was not encoded with the native encoding of the current system, we will have to manually specify its encoding via the encoding
argument in knit(). For example, if the source document was written in Simplified Chinese and encoded in GB2312, we need to compile it by:
Note that knitr does not try to automatically detect the encoding of the input document, but the editors usually know the encoding information about the documents. For example, both RStudio and will pass the encoding string to knitr before a document is compiled.