Hour 23. Dynamic Reporting


What You’ll Learn in This Hour:

Image What dynamic reporting is

Image How to create a report in R

Image Including R code in reports

Image The basics of markdown and LaTeX


Up to this point you have seen the fundamentals of the R language as well as aspects of R that allow us to ensure that we write high-quality, well-documented, and easily shareable code. In this hour, we are going to take a look at one of the ways you can extend your use of R, specifically for simplifying the generation of reports that rely heavily on R-generated output.

What Is Dynamic Reporting?

We all produce reports for a variety of reasons on a regular basis. If you have used R to manipulate data, perform analysis, or produce graphics, you are likely at some point to have copied results or inserted a graphic into a report. This usually means that you have all of your analysis saved in one place and your final report in another, and you need to ensure that you keep both up to date. This can be particularly challenging if your data changes on short notice and you need to quickly regenerate your report, or if you need to produce the same report on a regular basis.

Dynamic reporting, also commonly referred to as automated reporting or reproducible reporting, is a means by which we can generate a report entirely in R. The content of the report and the code to perform any manipulation or analysis are stored together. There are a number of advantages to writing reports in this way, including the following:

Image No need to copy and paste into a separate report

Image Easy to track what code was used for the analysis in a report

Image Simple to re-run the report if the data changes

Image Easy to run reports that need to be produced on a regular basis

Traditionally, we did this in R using Sweave. Sweave allows us to combine R code inside LaTeX documents. LaTeX is a markup language that is used commonly in scientific reporting. It was designed for writing technical documents and requires a TeX installation. Although a very powerful tool, it has quite a steep learning curve. More recently the package knitr was introduced to R. Although it still allows users to produce documents using LaTeX, it also allows us to use Markdown, which is another markup language. Markdown is much simpler to get started with, having a restricted syntax. Also, rather than only producing static PDF documents, it allows us to generate HTML or Microsoft Word files as well as PDF. This makes it very simple to embed any HTML content that we want into reports, and as you will see in the final hour of this book, this means we can generate interactive documents.

An Introduction to knitr

As just mentioned, the package knitr has been designed to simplify the way in which we generate documents in R. You already saw the package knitr in Hour 20, “Advanced Package Building,” when we generated a user guide for a package.

Although we commonly think of reports as long documents that contain an analysis or summary of results that we can produce in Microsoft Word or similar software, we can also think about reports as being presentations that we typically produce using Microsoft PowerPoint or other similar software. We can use knitr to produce both types of documents, in either PDF or HTML format, primarily depending on whether we choose to use LaTeX or Markdown to write our documents (although Markdown is more flexible in the file type that can be produced).

Simple Reports with RMarkdown

You have already seen the basics of RMarkdown in Hour 20. Markdown itself is a simple, plain-text markup language that has a number of variants that are all very similar. RMarkdown is the variant that allows us to include chunks of R code inside a document to be rendered to HTML. Note that the RStudio options make it very simple to render an RMarkdown document as a PDF, which requires a Tex installation, or as a Microsoft Word document. In this hour, we will only work with HTML documents for simplicity.

A Basic RMarkdown Document

To create an RMarkdown document, we need to create a file with the extension .Rmd. Using RStudio, we can create a template RMarkdown document that includes sample RMarkdown content. We can create this file by selecting R Markdown from the “File > New File” menu. This presents an options window that allows us to select the type of document that we want to generate. An example of this options window is shown in Figure 23.1. As you can see, you can select the type of document you want to create as well as the output you want to generate. In this case, we will simply use the default document and create HTML output. You will also notice that this screen allows you to insert the title of the document and the author name. Adding these components on this screen will automatically insert them correctly into the document header. After you click OK, a template document will be opened.

Image

FIGURE 23.1 RMarkdown file creation options in RStudio

All RMarkdown documents begin with a header that defines certain components, such as the title, author, and date, as well as the output format and any options for the output format such as styling. An example of the header is shown in Listing 23.1, lines 1 to 5.

LISTING 23.1 RMarkdown Example


 1: ---
 2: title: "Automated Reporting"
 3: author: "Aimee Gott"
 4: output: html_document
 5: ---
 6:
 7: The following report contains an analysis of the data from 2015.
 8:
 9: ## Analysis
10: A simple linear model was fitted to the data to determine the main factors that
11: contribute to a change in the dependent variable. We can see below some simple
12: summaries of the data.
13:


After this header we can simply start writing our document. This could be plain text, but we can also format the text using the Markdown formatting options you saw in Table 20.1 in Hour 20. An example of how a Markdown document might look can be seen in Listing 23.1.


Tip: Creating Presentations

As you will have noticed from the options in Figure 23.1, you can also create a presentation using Markdown. Selecting the HTML presentation options will control all of the setup for you. The main difference to note is that new slides are started with a new Level 1 or Level 2 heading; otherwise, all markdown formatting and code chunks are the same.


Building an HTML File

Because we are writing our document in a markup language, we will need to build the RMarkdown file to generate the HTML. The easiest way to do this is using the interface in RStudio. You will notice that after opening an RMarkdown file you have the additional option at the top of your file viewer labelled “Knit HTML.” Before generating the HTML, you will need to save the RMarkdown file with the extension .Rmd. Selecting the “Knit HTML” option will generate the corresponding HTML file and open a preview for you, as well as save the HTML file in the same location as the RMarkdown file. This HTML file can be opened by any web browser and can be shared in the same way as any other static file.

Including R Code and Output

We include sections of R code in documents inside code “chunks”. These chunks in RMarkdown are indicated by three back ticks at the start and end of the chunk. We also use curly brackets to indicate that the code is R code and include any additional options we wish to set. Three examples of code chunks are shown in Listing 23.2.

LISTING 23.2 RMarkdown Code Chunks


 1: ```{r, collapse = TRUE}
 2: library(mangoTraining)
 3: summary(pkData$Conc)
 4: ```
 5:
 6: ```{r, echo = FALSE}
 7: library(ggplot2)
 8: qplot(Time, Conc, data = pkData)
 9: ```
10:
11: ```{r, echo = FALSE}
12: library(knitr)
13: kable(head(pkData))
14: ```


As you can see in these examples, we can include any executable R code inside these chunks, whether the code generates console output or graphics output. The final code chunk, in lines 11 to 14, even includes table output. The knitr function kable will convert data output to Markdown table code, resulting in an HTML table in your document.

You will also notice in these code chunks that we have set some options inside the curly brackets, called collapse and echo. The first of these, collapse, keeps the code and output in the same box in the output. This is useful if you have a number of lines of code and output that you want to group together. This is useful in vignettes, but in general you would not want to include the R code in a formal document. In this case, the echo option is particularly useful. The echo option controls whether the code is returned in the document as well as the output. You will notice that this has been set in the second two code chunks, on lines 6 and 11. In these cases, when the document is created you will see that only the output appears (in these cases, a graphic and a table).


Tip: Setting Up Your Document

You will notice that in the sample code chunks here, each chunk loads an R package that is then used. It is actually good practice to include all these components in a single code chunk at the start of the document, as you would any other R script. We would recommend that you also include in this chunk any sourcing of additional R scripts or reading of data. As you will see in Table 23.1, there are options you can set to ensure that this chunk is run but no output included in the report.

Image

TABLE 23.1 knitr Options for Code Chunks


There are many more options you can set to control the behavior and output of code chunks, whether this is how or if the code is run or the look of graphics output. Some of the most commonly used options can be seen in Table 23.1.


Tip: Additional Code Chunk Options

We can set many more options for a code chunk. The easiest way to see all these options is to take a look at the knitr webpage at http://yihui.name/knitr/. This site is maintained by the package author, Yihui Xie, and includes a complete listing of all the options that can be set. To see these options, navigate to the Options page.


The final thing to mention in relation to including R code is how to include code inline—that is, in the body of the text. This is again done inside back ticks, but this time just one at each end of the code. We need to indicate that this is R code that should be executed, but otherwise we can include a line of code that will be run when the document is built. For example, we may have the following line in our RMarkdown document:

The median concentration for dose group 25 was `r median(pkData$Conc[pkData$Dose==25])`

In this instance, the median value would be inserted for us on creation of the document. This makes it very simple to reference values in the text and not have to worry about having to update the text if the data changes. An example of how the HTML for the content shown in this hour may look can be seen in Figure 23.2.

Image

FIGURE 23.2 Extract of a rendered HTML file generated from RMarkdown

Reporting with LaTeX

When it comes to creating documents in LaTeX, you will need to ensure that you first have a TeX installation. This is separate software that is not supplied with R, and the exact requirements will depend on your operating system. Windows users can install MiKTeX, OS X users will need to install MacTex, and Linux users TeX Live. For the remaining sections, it is assumed that you have been able to install the appropriate software for your operating system.

As previously mentioned, LaTeX is a markup language that is widely used in scientific reporting. One of its primary advantages is that it’s very simple to incorporate scientific notation into documents. A full introduction to LaTeX is beyond the scope of this book, but we will introduce some of the basics here. More specifically, we will focus on how to generate LaTeX documents from R and how to include R code and output, which will be new to those already familiar with LaTeX.

A Basic LaTeX Document

When we are generating documents using LaTeX in R, we create .Rnw files. These are Sweave files, but they can be converted to PDF using knitr, giving us all the options available in the knitr package. We can open a Sweave file from the RStudio New File menu by selecting R Sweave. In RStudio, this will open a document that contains some initial LaTeX tags for us to get started with. The whole document begins with the tag documentclass, which identifies the type of document we will produce. The next tag in the template will be egin{document}, followed by end{document}. It is between these tags that we will contain all the content of our document.

To add content to our document, we must again use specific format options. Table 23.2 shows the main LaTeX tags required for the components equivalent to those we introduced in Markdown in Hour 20.

Image

TABLE 23.2 Basic LaTeX Notation

As an example of how a LaTeX document might look, Listing 23.3 shows the LaTeX equivalent of Listing 23.1.

LISTING 23.3 A Basic LaTeX Document


 1: documentclass{article}
 2:
 3: itle{Automated Reporting with LaTeX}
 4: author{Aimee Gott}
 5: date{}
 6:
 7: egin{document}
 8:
 9: maketitle
10:
11: The following report contains an analysis of the data from 2015.
12:
13: section{Analysis}
14: A simple linear model was fitted to the data to determine the main factors that
15: contribute to a change in the dependent variable. We can see below some simple
16: summaries of the data.
17: end{document}


You will notice that just like the Markdown document, we have a header that gives the document type, the title, and the author. It is also worth noting that to have the header appear in your document, you will need to include the maketitle tag, shown on line 9.


Tip: Creating the PDF

Just like for Markdown documents, much functionality has been incorporated into RStudio, and this includes compiling the PDF. Rather than any knit option, however, you will see the option “Compile PDF.” This will require the TeX installation we mentioned. To ensure that you are using knitr, and therefore have all knitr options available, you will need to check the Sweave global options. From the Tools menu select “Global Options”, and then select the “Sweave” tab. You will notice in this menu system the option for how to weave the files (that is, Weave Rnw files using). Ensure that this is set to knitr. If you created the file before changing these options, you will need to remove the concordance line that will have been inserted by RStudio.


Including Code in a LaTeX Document

Just as with Markdown, we can include R code in our documents by incorporating code chunks. When we are using knitr, we have all the same chunk options, but in terms of the code the only difference is the way in which a code chunk is identified. Listing 23.4 gives the same code chunks as we included for Markdown in Listing 23.2.

LISTING 23.4 Sweave Code Chunks


 1: <<collapse = TRUE>>=
 2: library(mangoTraining)
 3: summary(pkData$Conc)
 4: @
 5:
 6: <<echo = FALSE>>=
 7: library(ggplot2)
 8: qplot(Time, Conc, data = pkData)
 9: @
10:
11: <<echo = FALSE>>=
12: library(knitr)
13: kable(head(pkData))
14: @


As you can see, the code chunks when we are writing Sweave documents start with << >>=, with any options being set inside the inner < >. We can use all the same knitr code chunk options listed in Table 23.1. The code chunks end with the @ symbol. We can include in the code chunks any executable R code that generates any form of output, including graphics, and using the kable function again we can generate a table, this time in LaTeX format.

As with Markdown, we can also include inline code. The Sweave equivalent is Sexpr. As an example, we might have the following line in our document:

The median concentration for dose group 25 was
Sexpr{median(pkData$Conc[pkData$Dose==25])}

Anything inside the Sexpr will be executed as a single line of code and the output inserted into the text when the PDF is compiled. An example of the PDF that would be generated from the examples in this hour can be seen in Figure 23.3.

Image

FIGURE 23.3 Extract of the output PDF file created from the Sweave content shown

Summary

You have now seen the basics of how to generate a static report in R. There are many more things you can do to these reports, such as including styles to ensure that the reports look well presented and, where necessary, follow a required company or institution template. However, here we have introduced the basics of what can be done. In the final hour, we are going to see how to extend some of these ideas to generate interactive web applications and interactive reports.

Q&A

Q. I am just starting out creating reports in R. Which should I learn, Markdown or LaTeX?

A. If you have never used LaTeX before, I would recommend starting with Markdown. Its limited syntax means that it is much easier to get started with, but allows the flexibility to create documents in a number of formats. However, if you need to include a large number of mathematical formulas or a more sophisticated layout in your documents, you may find that it is more beneficial to learn LaTeX. You can include formulas in a Markdown document, but this requires an additional component, mathjax, that allows you to write LaTeX inside a Markdown document.

Q. Can I customize the style of my documents?

A. The styling or template you use will depend on the type of document you are creating, but it is straightforward to do. If you are creating an HTML file, you will need to have or create a CSS file that defines the styles for components of HTML. You can then simply add this information to the header of your Markdown document. If you are using LaTeX, you will need to create a LaTeX-style file to apply to your documents. This can be challenging to do initially, but if the style already exists, you will typically only need to change the type of document that is created in the documentclass option.

Workshop

The workshop contains quiz questions and exercises to help you solidify your understanding of the material covered. Try to answer all questions before looking at the “Answers” section that follows.

Quiz

1. What are the two markup languages you have seen for creating documents from R?

2. How do you refer to blocks of R code in a document?

3. Do you have to include R code in your final document?

4. What file extension do you give to Markdown files and Sweave files, respectively?

Answers

1. The two markup languages are Markdown (or more specifically, RMarkdown) and LaTeX.

2. Blocks of R code are referred to as “code chunks.”

3. No, you can set the option echo to be FALSE, and this will prevent the code from appearing in the final document.

4. You give the extension .Rmd to RMarkdown files and .Rnw to Sweave files.

Activities

1. Create a simple RMarkdown document that has the following attributes:

Image Has a title, your name, and today’s date

Image Has three sections—introduction, analysis, and conclusion—each containing a paragraph of simple text

Image Includes a code chunk that generates a plot of Ozone against Wind from the airquality data

Image Fits a simple linear model of Ozone against Wind, returning the coefficients of the model in a table

Image Ensures that none of the R code or any warnings or messages are displayed in the final document

2. Generate the HTML file for the RMarkdown document you have just created.

3. Try creating this same document using LaTeX.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset