An introduction to R and RStudio

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:

    • What is R and RStudio
    • How do we write code?
    • How to install packages
    • How to get help in R

Introduction

  • This course will lean heavily on programming;

    • while it is possible to perform statistical analyses by hand for some very simple “toy” problems, realistic problem solving must be done on a computer.
  • This course does not assume that you are already familiar with programming;

    • this course will also not require a deep knowledge of programming or computer science.
    • However, everyone is responsible to learn enough R to become proficient with standard modeling and plotting functions.
  • Students are recommended to use the lessons in Sofware Carpentry as a free reference for scientific programming in R.

What is R?

  • There are a number of common choices of programming/ scripting languages for performing statistical modelling, e.g.:
    • SAS
    • SPSS
    • STATA
    • Python
    • R
  • We will use R for the following reasons:
    1. it is free and open source software with extensive documentation and tutorials available;
    2. it has well established libraries for statistical modeling with a wide functionality;
    3. the entry barrier to using the R language in terms of computer science training is very low; and
    4. there are free interactive, introductory lessons from DataCamp which will be used for homework assignments for additional practice outside of class.

Required software sources

RStudio

View of R studio development environment.
  • “RStudio” is a commonly used and supported integrated development environment for R.
  • RStudio is highly recommended for all beginning programmers and will be required software for this class;
    • this is not the same thing as R, but a set of graphical tools to quickly write and develop code.
  • The figure on the left shows the RStudio environment as a collection of different windows.
  • In the left-most window is the console, where an interactive session of R is taking place.
  • R can be used as an “interactive” language, in which an interpreter accepts commands and returns a response in real time.
  • R can also be used as “scripting” language, in which a script or a set of instructions are given to R to perform and an output is directed based on the script.

RStudio for scripting

Image of the  R studio script editor
  • In the view to the left, the RStudio has an additional window for a scripting editor.
  • In this text page, with file extension “.R”, we can write a series of commands for the R interpreter to follow to produce a result.
  • Notice, in the right-hand-side windows there are tabs for “Environment” and “History”.
  • During an R session, working interactively or in a script, R will accumulate a history of commands and an environment of different variables which are active in the memory.

Workflow in RStudio

Image of the  R studio script editor
  • Typically, one will use R studio in one of two ways:
    1. work can be done interactively in the console to perform exploratory analysis;
      • in this case, the command history will keep a record of the actions performed.
    2. work can be done in a script, while running code sections in the RStudio interface.
      • A line of code can be run in the editor by using the “run” button in the editor window.
  • This is helpful because you can keep a history of your analysis and work on it incrementally by running a line or a few lines of code at a time.
  • Typically, the second way is preferrable, especially when integrating R code into a “Notebook”.

R Notebooks

Image of the  R studio notebook editor
  • In this class, the preferred way of working on projects and homework will be in the form of an R Notebook
  • One R Notebook template is shown on the left-hand-side of the image, in which we see a mixture of Markdown/ HTML and R code in “chunks”.
  • On the right-hand-side of the editor, we see a live preview of the notebook, rendered as a document.
  • R Notebooks have the advantage of both the scripting style programming and interactive programming:
  1. we can work interactively with data and scripts, with the stored environmental variables updating with our commands;
  2. at the same time, we have a place to document all of our command history, with extensive commenting on our analysis and process of investigation.

R Notebooks – continued

Image of the  R studio notebook editor
  • Writing up our analysis simultaneously with our code makes a kind of “lab-notebook”, from which R notebooks are inspired.
  • This lab notebook can be exported as a PDF, HTML page, and other formats.
  • Exporting an R notebook into HTML or PDF is the required way to turn in homework.
  • To write an R Notebook, you may need some familiarity with basic Markdown.
    • Markdown is a simplified version of HTML for formatting documents quickly based on standard templates.
    • You can find a collection of Markdown formatting commands in the linked cheat-sheet

Installing packages

Image of the CRAN main webpage.
  • The strength of R as a language comes from the variety of packages/ libraries that are available for use.
  • These libraries are mostly written by statistical scientists for free and public use in academic settings;
    • Note: some libraries have restrictions of use for commercial purposes.
  • These libraries, as with the current and development version of the R language, are hosted by the CRAN project.
  • We also note, because this is a community repository, not all software is built to the same quality or with standard conventions.
  • However, we will mostly use what have become “standard” libraries, which are well maintained and widely accepted and supported by the community.

Installing packages – continued

  • We will often use the “Faraway” package which contains many example data sets to study — to install this, we can simply type:
install.packages('faraway')
  • The “install.packages()” function will initiate an installation of the library with the package manager.

    • This will connect your installation of R directly with CRAN, and handle all dependencies, so you don't need to do anything else.
  • When a library has already been installed, but we want to use it in our environment, we can simply call

require(faraway)
  • For the remaining introduction to coding in R, we will also want the following packages:
install.packages("ggplot2")
install.packages("plyr")
install.packages("gapminder")

Getting help in RStudio

  • Whenever you are uncertain about the use of a function or a topic in general, you can use the “?” command in R to obtain a help file.
?install.packages
  • If you're not sure what package a function is in, or how it's specifically spelled you can do a fuzzy search:
??install.packages
  • This will pull up related documentation and help pages in a search format.