A

Internals

In this appendix we explain some internal structures of the knitr package, which may help other developers better understand this package, and contribute code when necessary. General users do not need to read this appendix. We show the internals in three aspects: documentation, the application of closures, and the implementation of some features.

A.1    Documentation

There are three types of documentation in knitr: the R documentation (Rd), the PDF manuals, and the website.

The R documentation is based on roxygen2 (Wickham et al., 2015), which allows one to write Rd in roxygen comments (#’) with tags, and these comments will be translated into the real Rd. Below is an example of the roxygen comment:

Image

It will be translated into Rd as:

Image

There is a series of tags in roxygen such as @usage, @param, @return, and @examples, which correspond to usage{}, arguments{item{}}, value{}, and examples{}, respectively, in Rd. The advantage of writing roxygen comments over the official Rd is that we can keep the documentation and the source code in the same file; by comparison, the official approach to writing R packages is to write R sources under the R/ directory, and manual pages as *.Rd files under man/. This is not convenient because we have to jump between two files, and it is likely that we update the R source but forget to update the documentation. Roxygen comments appear right above the R functions in the source, so it is much easier to maintain both the source and documentation.

Below is a complete example of a function documented with roxygen comments:

Image

We can use the roxygenize() function in roxygen2 to convert roxygen comments to the official Rd files. All objects in knitr are documented in this way. Besides, roxygen2 also handles NAMESPACE and the Collate field in DESCRIPTION automatically, so we can really focus on working R source files.

The source documents of the PDF manuals are under the examples directory (see inst/examples/ in the source package), e.g., the main manual is knitr-manual.Rnw. The Rnw files are exported from Image files (Section 4.2), so it is recommended to open the Image files to edit or compile PDF manuals. The PDF manuals are not shipped with the source package, because (1) I do not want to put binary files under version control (especially when they are by-products of source files) and (2) they are hosted in the package website.

The package website is built on Jekyll as introduced in Section 13.4. Specifically, all pages are written in Markdown, and put under the gh-pages branch in the Git repository (the package itself is in the master branch). Github will rebuild the website automatically once changes are pushed there through Git. If you want to contribute to the website, just switch to the gh-pages branch, and update the Markdown files.

A.2    Closures

Closures play a central role in knitr; some common objects such as opts_chunk (Section 5.1.1) and knit_engines (Chapter 11) are built on closures.

A closure is essentially a function, and it also has access to non-local variables. Below is a simple example:

Image

The function g() was created from f() (note f() returns a function), g() uses an object x that was created inside f(), and x only exists in f(). No matter where g() is called, it always has access to this x.

In fact, we can even modify non-local variables through a closure. Below is a minimal example that shows how the chunk options manager opts_chunk works:

Image

The function new_list() returns a list of functions (a setter and a getter). The object default is bound to these two functions. You can think of it as the default list of chunk options. Next we show how to get and set the chunk options.

Image

Image

In the $set() function, we used <<- to assign the arguments to the object default, and that is why we can modify this object in the parent environment (had we used the normal <-, default in the parent environment would not be modified; a local copy will be created instead).

By using closures, knitr can manage objects in their own environments with the same syntax. The internal function new_defaults() in knitr is used to create such a list of closures.

Besides the objects opts_chunk (for managing chunk options) and knit_engines (for managing language engines), there are a few other similar objects:

opts_knit package options (Section 12.2)

opts_current chunk options for the current chunk

opts_template chunk option templates (Section 12.1.2)

knit_hooks hook functions (both output hooks and chunk hooks)

knit_patterns syntax patterns for the parser (Section 5.1)

A.3    Implementation

This section explains some implementation details for this package. One minor thing to mention first is that I use = instead of <- as the assignment operator, and you will see = all over the place in the source code. It is a matter of personal taste, and I do not see real disadvantages in it, but you are expected to follow = when contributing code to this package. In this book, you see <- because I typed equal signs but they were automatically replaced by formatR.

A.3.1    Parser

The document parser (Section 5.1) works like this: the child elements chunk.begin and chunk.end in the syntax pattern object are used to split the document into pieces (code chunks and text chunks), and for the code chunks, the chunk options (i.e., the text extracted from the first line) are parsed as R code, and this is why chunk options have to follow the R syntax. Here is an example explaining how knitr gets chunk options from a text fragment:

Image

First we added the function alist() around the text, and this function will treat its arguments as if they described function arguments, therefore no “arguments” will be evaluated at this time. However, the syntax must be valid at least; one exception is the chunk label: it is automatically quoted if necessary, since it is supposed to be a character string. The internal function parse_params() is used to parse chunk options:

Image

Image

The chunk options are not evaluated until before the chunks are executed, so the chunk options can use objects of unknown values in the document at the parsing time. For example, the options echo and foo above are unevaluated expressions, and we will evaluate them explicitly later:

Image

All code chunks are stored as a named list in an internal object knit_code; the names are chunk labels, and the content is the code. This object is also created as a list of closures, so it has the get() and set() methods, but it is not recommended to modify this object due to possible unexpected consequences. If needed, we can access code chunks via knitr:::knit_code$get(’chunk-label’).

A.3.2    Chunk Hooks

There is a number of default hooks in knit_hooks, which are output hooks (Section 5.3):

Image

Any other hooks in this object are treated as chunk hooks (Chapter 10). Before and after a code chunk is executed, all extra hooks will be called. Here is the pseudo code:

Image

One issue to keep in mind is the order of the hooks to run: if there are two hooks A and B defined in knit_hooks, what is the order in which they are called? This order is obtained from chunk options: there must be two chunk options, A and B, corresponding to these two hooks, and the order of chunk options determines the order in which to run the hooks; e.g., if A is before B, then hook A is called before B. However, after a code chunk has been evaluated, the order is reversed, and the reason is to make sure the results returned by the hooks pair in groups. For example, suppose the hook A returns begin{Aenvir} before a chunk, and end{Aenvir} after a chunk; similarly B returns Benvir. Then what we want in the output is this:

Image

Note end{Benvir} comes before end{Aenvir}. For this reason, the following two chunks return different results when hooks A and B are defined:

Image

A.3.3    Option Aliases

It takes only a few lines to implement chunk option aliases (Section 12.1.1), since it is a simple operation of substituting certain elements in a list. Below is a short function that illustrates the idea:

Image

Image

Aliases are set in a named character vector, and the names are the aliases of the elements in the vector. In the above example, apply_aliases() added elements fig.width and fig.height into the list op according to the values of w and h, respectively, which were specified by the user, but internally knitr still uses fig.width and fig.height.

A.3.4    Cache

The cache in knitr is also managed by an object consisting of closures, but it is more complicated (see the internal function new_cache()). The closures are used to save, load, and delete cache files, and we only explain one aspect of the cache here: how the side effect of printing is cached (Section 8.4).

As we mentioned in Section 5.3, the code chunks are evaluated by the evaluate package. As a matter of fact, printed results are returned as character strings, and the output of the whole chunk is also a character string (formatted by output renderers). This character string is assigned to a variable, with the variable name constructed from the MD5 hash and the chunk label. This variable is saved in the cache database along with all other variables created in the chunk. The next time the chunk is to be evaluated, knitr will check if the chunk needs to be updated; if not, all objects will be loaded directly, including the object of the chunk output, which also contains the printed results (in fact, everything of this chunk); instead of re-evaluating the chunk, this object is written into the output directly.

A.3.5    Compatibility with Sweave

Since knitr uses some different chunk options with Sweave, there is a function Sweave2knitr() to correct the inappropriate options and their values. For example, results = tex is changed to results = ’markup’ automatically (because ’tex’ is not an appropriate value to reflect what the results option really does).

The implementation is mainly based on regular expressions, and here is a simple example:

Image

Sweave2knitr() takes care of a large number of cases of inappropriate chunk options as well as SweaveOpts{} and SweaveInput{}. See Section 16.1 for examples.

A.3.6    Concordance

The concept of concordance is specific to Rnw/LTX. The problem to solve is the mapping of line numbers between the Image output and the Rnw source. When an error occurs in Image, we know the line number of the problematic line (by parsing the error log), but we do not know the corresponding line number in the Rnw source document, because the line numbers of the two documents may not match. One chunk of 5 lines in the Rnw document may produce 10 or 3 lines of Image code in the output.

Sweave has a better implementation of concordance than knitr. The mapping is more precise in Sweave. In knitr, it is only an approximation achieved in this way: when parsing the source document, the number of lines of the code chunks and text chunks are recorded; after these chunks have been evaluated, the number of lines of the corresponding output chunks is calculated again. Suppose one source chunk has 5 lines, and if

•  the output has 5 lines too, the i-th line in the source is mapped to the i-th line in the output

•  the output has 3 lines, the first 3 lines of the source are mapped to the 3 lines in the output

•  the output has 10 lines, the 5 lines of the source are mapped to the first 5 lines in the output

Obviously this may not be a good approximation, but it should be helpful enough for error navigation. At least the error number in Image can point to a rough area of the problematic source.

The other use of concordance is the navigation between PDF and Rnw files. SyncImage supports this kind of navigation: you can click one line in the PDF document to jump back to the source file, or click one line in the source to jump to the PDF. Without the concordance information, we cannot navigate between Rnw and PDF (only Image↔PDF is possible).

For now, only RStudio uses the concordance information produced by knitr. To enable concordance (it is disabled by default), you can set the package option (RStudio does this automatically):

Image

When concordance is enabled, a file input-concordance.tex will be generated if the Rnw file is named as input.Rnw. This file contains compressed mapping information.

A.4    Syntax

Users may wonder why knitr uses different input syntax for different document formats (Section 5.1), e.g., Rnw uses <<>>=, and Rmd uses ```{r}. In fact, the syntax is not tied to document formats; we can certainly use the Rnw syntax for Rmd documents.

# This is a markdown document

Here is a **code chunk**:

<<test>>=

1+1

rnorm(5)

@

And an inline value Sexpr{pi}.

For the example document above (suppose it is named test.Rmd), we can compile it by:

Image

The function pat_rnw() sets the syntax to be Rnw, and the function render_markdown() sets the output renders to be Markdown hooks.

But why not use the Rnw syntax for all documents? The decision was made because I wanted more natural syntax according to the authoring format, and <<>>= is not a valid markup in any document format; e.g., it is neither a Image command nor an HTML tag. In fact, Sweave has another set of syntax that is Image-like, e.g.,

egin{Scode}{fig = TRUE, echo = FALSE}

library(“graphics”)

boxplot(Ozone ~ Month, data = airquality)

end{Scode}

I would prefer [] to {} for chunk options, which will be a more natural choice in Image. Anyway, <<>>= remained in knitr due to its popularity.

Except for Rnw documents (due to historic reasons), other formats make the knitr source documents still valid documents even before the R code is executed. For example, R code in R HTML documents is put in HTML comments (<! -- -->).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset