Learning RStudio for R Statistical Computing — Save 50%
Learn to effectively perform R development and statistical analysis and reporting with the most popular R IDE book and ebook.
In this article by Mark P.J. van der Loo and Edwin de Jonge, the authors of the book Learning RStudio for R Statistical Computing, prerequisites for producing a report are discussed and how to produce reports via Notebook that automatically include the results of an analysis is explained.
(For more resources related to this topic, see here.)
A very important feature of reproducible science is generating reports. The main idea of automatic report generation is that the results of analyses are not manually copied to the report. Instead, both the R code and the report's text are combined in one or more plain text files. The report is generated by a tool that executes the chunks of code, captures the results (including figures), and generates the report by weaving the report's text and results together. To achieve this, you need to learn a few special commands, called markup specifiers, that tell the report generator which part of your text is R code, and which parts you want in special typesetting such as boldface or italic. There are several markup languages to do this, but the following is a minimal example using the Markdown language:
A simple example with Markdown
The left panel shows the plain text file in RStudio's editor and the right panel shows the web page that is generated by clicking on the Knit HTML button. The markup specifiers used here are the double asterisks for boldface, single underscores for slanted font, and the backticks for code. By adding an r to the first backtick, the report generator executes the code following it.
To reproduce this example, go to File | New | R Markdown, copy the text as shown in the preceding screenshot, and save as one.Rmd. Next, click on Knit HTML.
The Markdown language is one of many markup languages in existence and RStudio supports several of them. RStudio has excellent support for interweaving code with Markdown, HTML, LaTeX, or even in plain comments.
Notebooks are useful to quickly share annotated lines of code or results. There are a few ways to control the layout of a notebook. The Markdown language is easy to learn and has a fair amount of layout options. It also allows you to include equations in the LaTeX format. The HTML option is really only useful if you aim to create a web page. You should know, or be willing to learn HTML to use it. The result of these three methods is always a web page (that is, an HTML file) although this can be exported to PDF.
If you need ultimate control over your document's layout, and if you need features like automated bibliographies and equation numbering, LaTeX is the way to go. With this last option, it is possible to create papers for scientific journals straight from your analysis.
Depending on the chosen system, a text file with a different extension is used as the source file. The following table gives an overview:
Input file type
Report file type
.html (via .md)
.html (via .md)
.pdf (via .tex)
Finally, we note that the interweaving of code and text (often referred to as literate programming) may serve two purposes. The first, described in this article, is to generate a data analysis report by executing code to produce the result. The second is to document the code itself, for example, by describing the purpose of a function and all its arguments.
Prerequisites for report generation
For notebooks, R Markdown, and Rhtml, RStudio relies on Yihui Xie's knitr package for executing code chunks and merging the results. The knitr package can be installed via RStudio's Packages tab or with the command install. packages("knitr").
For LaTeX/Sweave files, the default is to use R's native Sweave driver. The knitr package is easier to use and has more options for fine-tuning, so in the rest of this article we assume that knitr is always used. To make sure that knitr is also used for Sweave files, go to Tools | Options | Sweave and choose knitr as Weave Rnw files. If you're working in an RStudio project, you can set this as a project option as well by navigating to Project | Project Options | Sweave. When you work with LaTeX/Sweave, you need to have a working LaTeX distribution installed. Popular distributions are TeXLive for Linux, MikTeX for Windows, and MacTeX for Mac OS X.
The easiest way to generate a quick, sharable report straight from your Rscript is by creating a notebook via File | Notebook, or by clicking on the Notebook button all the way on the top right of the Rscript tab (right next to the Source button).
RStudio offers three ways to generate a notebook from an Rscript—the simplest are Default and knitr::stitch. These only differ a little in layout. The knitr::spin mode allows you to use the Markdown markup language to specify text layout. The markup options are presented after navigating to File | Notebook or after clicking on the Notebook button. Under the hood, the Default and knitr::stitch options use knitr to generate a Markdown file which is then directly converted to a web page (HTML file). The knitr::spin mode allows for using Markdown commands in your comments and will convert your .R file to a .Rmd (R Markdown) file before further processing.
In Default mode, R code and printed results are rendered to code blocks in a fixedwidth font with a different background color. Figures are included in the output and the document is prepended with a title, an optional author name, and the date. The only option to include text in your output is to add it as an R comment (behind the # sign) and it will be rendered as such.
In knitr::stitch mode, instead of prepending the report with an author name and date, the report is appended with a call to Sys.time() and R's sessionInfo(). The latter is useful since it shows the context in which the code was executed including R's version, locale settings, and loaded packages. The result of the knitr::stitch mode depends on a template file called knitr-template.Rnw, included with the knitr package. It is stored in a directory that you can find by typing system. file('misc',package='knitr').
The knitr::spin mode allows you to escape from the simple notebook and add text outside of code blocks, using special markup specifiers. In particular, all comment lines that are preceded with #' (hash and single quote) are interpreted as the Markdown text. For example, the following code block:
# This is printed as comment in a code block 1 + 1 #' This will be rendered as main text #' Markdown **specifiers** are also _recognized_
Will be rendered in the knitr::spin mode as shown in the following screenshot:
Reading a notebook in the knitr::spin mode allows for escaping to Markdown
The knitr package has several general layout options for included code (that will be discussed in the next section). When generating a notebook in the knitr::spin mode, these options can be set by preceding them with a #+ (hash and plus signs). For example, the following code:
#' The code below is _not_ evaluated #+ eval=FALSE 1 + 1
Results in the following report:
Setting knitr options for a notebook in knitr::spin mode
Although it is convenient to be able to use Markdown commands in the knitr::spin mode, once you need such options it is often better to switch to R Markdown completely, as discussed in the next section.
Note that a notebook is a valid R script and can be executed as such. This is in contrast with the other report generation options—those are text files that need knitr or Sweave to be processed.
Publishing a notebook
Notebooks are ideal to share examples or quick results from fairly simple data analyses. Since early 2012, the creators of RStudio offer a website, called RPubs. com, where you can upload your notebooks by clicking on the Publish button in the notebook preview window that automatically opens after a notebook has been generated. Do note that this means that results will be available for the world to see, so be careful when using personal or otherwise private data.
In this article we discussed prerequisites for producing a report. We also learnt how to produce reports via Notebook that automatically include the results of an analysis.
Resources for Article :
- Organizing, Clarifying and Communicating the R Data Analyses[Article]
- Customizing Graphics and Creating a Bar Chart and Scatterplot in R [Article]
- Graphical Capabilities of R[Article]
eBook Price: $14.99
Book Price: $29.99
About the Author :
Edwin de Jonge has worked for more than 15 years at the Dutch official statistics office (Statistics Netherlands). With a background in theoretical and computational solid state physics (MSc), he started in the statistical computing department. Currently he works in the statistical methodology department. His research interests include data visualization, data analysis, and statistical computing. He trained over 150 people in a workshop entitled “Graphical Analysis with R”. Edwin has coauthored several R packages that are available via CRAN: tabplot, tabplotd3, ffbase, whisker, editrules, and deducorrect.
Mark van der Loo obtained his PhD at the Institute for Theoretical Chemistry at the University of Nijmegen (The Netherlands). Since 2007 he has worked at the statistical methodology department of the Dutch official statistics office (Statistics Netherlands). His research interests include automated data cleaning methods and statistical computing. At Statistics Netherlands he is responsible for the local R center of expertise, which supports and educates users on statistical computing with R. Mark has been teaching R for several years and coauthored a number of R packages that are available via CRAN: editrules, deducorrect, rspa, and extremevalues. A list of publications can be found via http://www.markvanderloo.eu.