Reporting in Knitr and R Markdown

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Data analysis reports in RMarkdown
    • Markdown basics
    • Document compilation
    • Further resources

Data analysis reports

  • Data analysts tend to write a lot of reports, describing their analyses and results, for their collaborators or to document their work for future reference.
  • Many new users begin by first writing a single R script containing all of the work.
  • Then simply share the analysis by emailing the script and various graphs as attachments.
  • But this can be cumbersome, requiring a lengthy discussion to explain which attachment was which result.
  • Writing formal reports with Word or LaTeX can simplify this by incorporating both the analysis report and output graphs into a single document.
  • But tweaking formatting to make figures look correct and fix obnoxious page breaks can be tedious and lead to a lengthly “whack a mole” game of fixing new mistakes resulting from a single formatting change.

Reproducible analysis

  • Creating a web page (as an html file) by using R Markdown makes things easier.
  • The report can be one long stream, so tall figures that wouldn't ordinary fit on one page can be kept full size and easier to read, since the reader can simply keep scrolling.
  • Formatting is simple and easy to modify, allowing you to spend more time on your analyses instead of writing reports.
  • Ideally, such analysis reports are reproducible documents:
    • If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results
  • The key R package is knitr which we have been using in activities to create a mixture of text and chunks of code.
  • When the document is processed by knitr, chunks of code will be executed, and graphs or other results inserted into the final document.
  • knitr allows you to mix basically any sort of text with code from different programming languages, but we recommend that you use R Markdown, which mixes Markdown with R.
  • Markdown is a light-weight mark-up language for creating web pages.

Basic components of R Markdown

  • The initial chunk of text (header) contains instructions for R to specify what kind of document will be created, and the options chosen.
  • You can use the header to give your document a title, author, date, and tell it that you're going to want to produce html output (in other words, a web page).
---
title: "Initial R Markdown document"
author: "Jane Doe"
date: "April 23, 2015"
output: html_document
---
  • You can delete any of those fields if you don't want them included.
  • The double-quotes aren't strictly necessary in this case.
  • They're mostly needed if you want to include a colon in the title.
  • RStudio creates the document with some example text to get you started.

Markdown

  • Markdown is a system for writing web pages by marking up the text much as you would in an email rather than writing html code.
  • The marked-up text gets converted to html, replacing the marks with the proper html code.
  • If you know any HTML already, standard Markdown flavors usually accept HTML commands interspersed with Markdown for more detailed control of the output document.
  • You make things bold using two asterisks, like this: **bold**, and you make things italics by using underscores, like this _italics_.
    • The HTML equivalent wouldd look like <b>bold</b> or <emph>italics</emph>, where the Markdown equivalent usually simplifies the syntax.

Markdown continued

  • You can make a bulleted list by writing a list with hyphens or asterisks, like this:
* bold with double-asterisks
* italics with underscores
* code-type font with backticks
  • or like this:
- bold with double-asterisks
- italics with underscores
- code-type font with backticks
  • Each will appear as:

    • bold with double-asterisks
    • italics with underscores
    • code-type font with backticks
  • within their respective indentation levels.

Markdown continued

  • You can make a numbered list by just using numbers. You can even use the same number over and over if you want:

  • 1. bold with double-asterisks
    1. italics with underscores
    1. code-type font with backticks
    
  • You can make section headers of different sizes by initiating a line with some number of # symbols:

    # Title
    ## Main section
    ### Sub-section
    #### Sub-sub section
    
  • You compile the R Markdown document to an html webpage by clicking the “Knit” button in the upper-left.

Markdown continued

  • You can make a hyperlink like this: [text to show](http://the-web-page.com).

  • You can include an image file like this: ![caption](http://url/for/file)

  • If you know how to write equations in LaTeX, you can use $ $ and $$ $$ to insert math equations, inline like $E = mc^2$ and in its own chunk by

$$y = \mu + \sum_{i=1}^p \beta_i x_i + \epsilon$$
  • You can review Markdown syntax by navigating to the “Markdown Quick Reference” under the “Help” field in the toolbar at the top of RStudio.

R code chunks

  • The real power of Markdown comes from mixing markdown with chunks of code, and a special flavor of Markdown is designed for working well with R, R Markdown.
  • When processed, the R code will be executed; if they produce figures, the figures will be inserted in the final document.

  • The main code chunks look like this:

```{r load_data}
gapminder <- read.csv("gapminder.csv")
```
  • That is, you place a chunk of R code between ```{r chunk_name} and ```.
  • In a complex report, one should give each chunk a unique name, as they will help you to fix errors and, if any graphs are produced, the file names are based on the name of the code chunk that produced them.

How things get compiled

  • When you press the “Knit” button, the R Markdown document is processed by knitr and a plain Markdown document is produced (as well as, potentially, a set of figure files): the R code is executed and replaced by both the input and the output;

    • if figures are produced, links to those figures are included.
  • The Markdown and figure documents are then processed by the tool pandoc, which converts the Markdown file into an html file, with the figures embedded.

plot of chunk rmd_to_html_fig

Chunk options

  • There are a variety of options to affect how the code chunks are treated. Here are some examples:

    • Use echo=FALSE to avoid having the code itself shown.
    • Use results="hide" to avoid having any results printed.
    • Use eval=FALSE to have the code shown but not evaluated.
    • Use warning=FALSE and message=FALSE to hide any warnings or messages produced.
    • Use fig.height and fig.width to control the size of the figures produced (in inches).
  • So you might write:

```{r load_libraries, echo=FALSE, message=FALSE}
library("dplyr")
library("ggplot2")
```
  • Often there will be particular options that you'll want to use repeatedly; for this, you can set global chunk options, like so:
```{r global_options, echo=FALSE}
knitr::opts_chunk$set(fig.path="Figs/", message=FALSE, warning=FALSE,
                      echo=FALSE, results="hide", fig.width=11)
```
  • The fig.path option defines where the figures will be saved.
  • The / here is really important; without it, the figures would be saved in the standard place but just with names that begin with Figs.
  • If you have multiple R Markdown files in a common directory, you might want to use fig.path to define separate prefixes for the figure file names, like fig.path="Figs/cleaning-" and fig.path="Figs/analysis-".

Inline R code

  • You can make every number in your report reproducible. Use `r and ` for an in-line code chunk, like so:
    • `r round(some_value, 2)`.
  • The code will be executed and replaced with the value of the result.
  • However, don't let these in-line chunks get split across lines.

Other output options

  • You can also convert R Markdown to a PDF or a Word document.
  • Click the little triangle next to the “Knit” button to get a drop-down menu.
  • Or you could put pdf_document or word_document in the initial header of the file.
  • Creating .pdf documents may require installation of some extra software.
  • If required, this is detailed in an error message.

  • TeX installers for Windows.

  • TeX installers for macOS.