R as a calculator and data types

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:

    • How to use R as a calculator
    • Variables and data types
    • Vectors and vectorization

R as a calculator

  • R accepts a set of human-readable instructions and converts these into machine language.

  • R can be used simply as a powerful calculator, for example:

    • if we enter a mathematical expression into an R console, we can evaluate mathematical expressions,
1 + 1
[1] 2

R as a calculator – continued

  • R uses standard mathematical notations for its operations, and follows the standard mathematical order of precedence:

  • Parentheses

(1 + 1)
[1] 2
  • Exponents
(1 + 1)^2
[1] 4
  • Division
(1 + 1)^2 / 4
[1] 1

R as a calculator – continued

  • Multiplication
(1 + 1)^2 / 4 * 3
[1] 3
  • Addition
(1 + 1)^2 / 4 * 3 + 1
[1] 4
  • Subtraction
(1 + 1)^2 / 4 * 3 + 1 - 2
[1] 2

R as a calculator – continued

  • R also has many standard built-in mathematical functions and variables, e.g.,
log(1)
[1] 0
cos(pi)
[1] -1
sin(pi)
[1] 1.224647e-16
  • The notation “ae-16” refers to the mathematical expression \( a \times 10^{-16} \), where \( a \) is the leading coefficient.

  • Notice that R doesn't see \( sin(\pi) \) as zero, as it is mathematically, but is extremely small.

  • This has to do with the way in which numbers are encoded into programming languages – this will be discussed further shortly.

typeof(sin(pi))
[1] "double"

Comparing things

  • Not all values in the computing language are numeric, and not all numerical values are built the same.

  • Consider the comparison operator “==” for evaluating if two inputs are the same,

sin(pi) == 0
[1] FALSE
0 == 0
[1] TRUE
  • We can also compare if two inputs are not the same,
1 != 2
[1] TRUE

Comparing things – continued

  • Notice that the outputs of the earlier comparisons are either “TRUE” or “FALSE” – these are examples of logical values, which are the output of logical expressions.
typeof(TRUE)
[1] "logical"
  • We can also compare the relative size of different values
1 > 2
[1] FALSE
2 >= 2
[1] TRUE
-1 <= 0
[1] TRUE

Variables and assignment

  • Values such as the output of different expressions can be assigned a variable name,
my_variable <- 2 + 2
  • In the above expression, the operator “<-” tells R to associate the output of the expression \( 2 +2 \) to “my_variable”.
my_variable
[1] 4
  • We can show the current variables in the environment using the command “ls()”
ls()
[1] "my_variable"

Variables and assignment – continued

  • We can re-assign a value to “my_variable” which will be stored in the environment and memory,
my_variable <- my_variable + my_variable
my_variable
[1] 8
  • Notice that the right hand side of the assignment operator “<-” is always evaluated first, then the assignment is given.

    • In this case, as above, we can recursively define a variable.

Variables and assignment – continued

  • Key to writing “good” code is to use good variable naming (and commenting).

    • Generally, it is preferable to name variables with something descriptive, e.g.,
mean_sea_surface_temp <- 10
  • For longer names as above, we can use e.g.,

    • underscores;
    • periods; or
mean.sea.surface.temp <- 10
  • capital letters.
meanSeaSurfaceTemp <- 10
  • All the above are commonly used conventions and all are acceptable — the key is to be clear and consistent in your code.

Variables and assignment – continued

  • Q: which of the following do you think are acceptable names for R variables?
min_height
max.height
_age
.mass
MaxLength
min-length
2widths
celsius2kelvin
  • A: the only ones that are not acceptable are
_age
min-length
2widths
  • This is because R will not accept a leading underscore, a leading number or a dash in the name.

    • Note: however, that a leading period in “.mass” creates a “hidden” variable, which you typicall will not want.

Vectorization

  • R is a vectorized language, meaning that variables and functions can have vectors as values.

  • A vector in R describes a set of values in a certain order of the same data type.

    • The type of data will become increasingly important as we start using vectors.
  • A simple way to construct a vector is with the constructor function “c()”

c(1, 3, 6)
[1] 1 3 6

Vectorization – continued

  • The function takes an arbitrary number of elements as above, and creates a vector.
my_variable <- c(TRUE, pi)
my_variable
[1] 1.000000 3.141593
  • Notice that the output of the above expression looks different from the input — this is because R forces vectors to have data of a single type:
typeof(my_variable)
[1] "double"
  • Here, the value “TRUE” has been forced into its numeric counterpart “1”.

Vectorization – continued

  • In the last example, we saw that a logical value “TRUE” was forced into a numeric value by the constructor function.

  • This variable “coercion” occurs in various situations, and we need to be careful with the results.

  • Q: what do you expect the result of the following to be?

1 == TRUE
  • A:
1 == TRUE
[1] TRUE
typeof(1)
[1] "double"
typeof(TRUE)
[1] "logical"

Vectorization – continued

  • Vectors are built by definition with an order of the data that is stored — data can be accessed by calling this index:
my_variable[1]
[1] 1
my_variable[2]
[1] 3.141593
  • Mathematical operations can also be performed on vectors when their arguments accept vectors, and they can be applied element-wise on the vector entries:
sin(my_variable)
[1] 8.414710e-01 1.224647e-16

Vectorization – continued

  • Certain functions allow us to construct vectors automatically based on a range of values, known as a “slice”
my_variable <- 1:5
my_variable
[1] 1 2 3 4 5
  • We can make a general slice where the arguments are given as a:b and returns a vector of all integer spaced values between a and b:
10:5
[1] 10  9  8  7  6  5
4:10
[1]  4  5  6  7  8  9 10
  • This is often quite useful for extracting a subset of data from a large vector or matrix.

Vectorization – continued

  • We can also apply a mathematical operation to a scalar element-wise by the entries of a vector
2^my_variable
[1]  2  4  8 16 32
  • Or use a vector as the index of a vector
my_variable[2:3]
[1] 2 3
  • This likewise goes for logical, comparison operators.

  • Q: what do you expect to be the output of the following line?

1:10 > 5
  • A:
1:10>5
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Vectorization – continued

  • Note that logical vectors are also useful for extracting subsets of data.

    • Particularly, we may wish to set up a statement that we wish to evaluate on the data and find all data points that satisfy the condition.
my_variable <- 1:10
my_index <- my_variable>5
my_variable[my_index]
[1]  6  7  8  9 10
  • We might also have non-numeric vectors, such as
my_variable <- c('red', 'blue', 'green')
my_variable
[1] "red"   "blue"  "green"
  • For such a vector, a logical statement can also be quite useful,
my_index <- my_variable == 'red'
my_variable[my_index]
[1] "red"