Control flow in R

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Control flow
      • if()
      • for()
      • while()

Control flow

  • Often when we're coding we want to control the flow of our actions – this often occurs when we write a script with detailed instructions.

  • This can be done by setting actions to occur only if a condition or a set of conditions are met.

    • Alternatively, we can also set an action to occur a particular number of times.
  • There are several ways you can control flow in R.

    • How to use control flow in R will be the first topic we will outline in the following.

Basic control flow

  • For conditional statements, the most commonly used approaches are the constructs if and else:
# if
if (condition is true) {
  perform action
}

# if ... else
if (condition is true) {
  perform action
} else {  # that is, if the condition is false,
  perform alternative action
}
  • This kind of binary logic is at the heart of classical (non-quantum) computing and used effectively can be used to create rich sets of commands.

  • if and else can be chained together to handle a wide variety of cases, and even to handle when we encounter errors.

An example

  • Say, for example, that we want R to print a message if a variable x has a particular value:
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
}

x
[1] 8
  • The print statement does not appear in the console because x is not greater than 10.

An example continued

  • To print a different message for numbers less than 10, we can add an else statement.
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else {
  print("x is less than 10")
}
[1] "x is less than 10"
  • You can also test multiple conditions by using else if.
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else if (x > 5) {
  print("x is greater than 5, but less than 10")
} else {
  print("x is less than 5")
}
[1] "x is greater than 5, but less than 10"
  • If we needed to sort some data into sub-categories based on a TRUE or FALSE statements, if, else if and else will equip us to handle complex cases.

An example continued

  • Remember: when R evaluates the condition inside if() statements, it is looking for a logical element, i.e., TRUE or FALSE.
x  <-  4 == 3
if (x) {
  "4 equals 3"
} else {
  "4 does not equal 3"          
}
[1] "4 does not equal 3"
  • As we can see, the not equal message was printed because the vector x is FALSE
x <- 4 == 3
x
[1] FALSE

Advanced details on conditions

  • Note: the if() function only accepts singular (of length 1) inputs, and therefore returns an error when you use it with a standard vector.

    • The if() function will still run, but will only evaluate the condition in the first element of the vector.
  • To use the if() function, you need to make sure your input is singular (of length 1).

Advanced details on conditions

  • The in ifelse() function in R accepts both if() and else() statements simultaneously as structured in the previous example.

  • This function accepts both singular and vector inputs and is structured as follows:

# ifelse function 
ifelse(condition is true, perform action, perform alternative action) 
  • The first argument is the condition or a set of conditions to be met;

  • the second argument is the statement that is evaluated when the condition is TRUE;

  • and the third statement is the statement that is evaluated when the condition is FALSE.

An example

  • Consider the following example of the ifelse function;

  • Q: can you hypothesize what will be the output of this statement?

y <- -3
ifelse(y < 0, "y is a negative number", "y is either positive or zero")
[1] "y is a negative number"

Repeating operations

  • In many instances in data analysis, we will need to repeatedly perform some operation.

  • This could be as simple as repeatedly opening a long list of files, taking out a line of data that is needed from each, and compiling all the data into a dataframe.

  • More complex analysis also often requires complex instructions to be delivered to software in R.

  • If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for() loop will do the job.

  • However, for performance for() loops should be avoided unless the order of iteration is important:

    • i.e. the calculation at each iteration depends on the results of previous iterations.
  • If the order of iteration is not important, then vectorized alternatives, such as the purr package, should be used whenever possible.

An example of for loops

  • The basic structure of a for() loop is:
for (iterator in set of values) {
  do a thing
}
  • For example:
for (i in 1:10) {
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
  • Notice that the 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.

Nested for loops

  • We can use a for() loop nested within another for() loop to iterate over two things at once.
for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    print(paste(i,j))
  }
}
[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"

Nested for loops

  • We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices.

  • Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.

  • Rather than printing the results, we could write the loop output to a new object.

output_vector <- c()
for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    temp_output <- paste(i, j)
    output_vector <- c(output_vector, temp_output)
  }
}
  • Our output_vector from the last slide thus prints as
output_vector
 [1] "1 a" "1 b" "1 c" "1 d" "1 e" "2 a" "2 b" "2 c" "2 d" "2 e" "3 a" "3 b"
[13] "3 c" "3 d" "3 e" "4 a" "4 b" "4 c" "4 d" "4 e" "5 a" "5 b" "5 c" "5 d"
[25] "5 e"

Nested for loops

  • The last approach can be useful, but 'growing your results' (building the result object incrementally) is computationally inefficient,

    • we should avoid this style of programming when iterating through a lot of values.
  • Computers are very bad at handling this efficiently, so your calculations can very quickly slow to a crawl.

  • It's much better to define an empty results object before hand of appropriate dimensions, rather than initializing an empty object without dimensions.

  • If you know the end result will be stored in a matrix like above, create an empty matrix with 5 row and 5 columns, then at each iteration store the results in the appropriate location.

Pre-allocation of memory

  • A better way is to define your (empty) output object before filling in the values.

  • For this example, it looks more involved, but is still more efficient.

output_matrix <- matrix(nrow=5, ncol=5)
j_vector <- c('a', 'b', 'c', 'd', 'e')
for (i in 1:5) {
  for (j in 1:5) {
    temp_j_value <- j_vector[j]
    temp_output <- paste(i, temp_j_value)
    output_matrix[i, j] <- temp_output
  }
}
output_vector2 <- as.vector(output_matrix)
  • This new output prints as
output_vector2
 [1] "1 a" "2 a" "3 a" "4 a" "5 a" "1 b" "2 b" "3 b" "4 b" "5 b" "1 c" "2 c"
[13] "3 c" "4 c" "5 c" "1 d" "2 d" "3 d" "4 d" "5 d" "1 e" "2 e" "3 e" "4 e"
[25] "5 e"

While loops

  • Sometimes you will find yourself needing to repeat an operation as long as a certain condition is met, or until some condition is met, but may not be met until after an unknown number of iterations.

  • You can do this with a while() loop.

while(this condition is true){
  do a thing
}
  • R will interpret a condition being met as “TRUE”.

  • A loop can be written to perform an action for a “FALSE” condition, until it is becomes “TRUE” by using the logical inverse !.

while(! this condition is False){
  do a thing
}

While loops example

  • As an example, here's a while loop that generates random numbers from a uniform distribution (the runif() function) between 0 and 1 until it gets one that's less than 0.1.

  • We will set a random seed set for reproducibility of the analysis.

set.seed(10)
z <- 1
while(z > 0.1){
  z <- runif(1)
  cat(z, "\n")
}
0.5074782 
0.3067685 
0.4269077 
0.6931021 
0.08513597 
  • Even though this is a while loop that terminates on a random condition (z>1), the seed means that each time we re-run the code we will get the same (pseudo)-random result.

While loops example

  • Now that we have learned a bit about while() loops, lets consider the following question

  • Q: what do you think will be the output of the below chunk of code?

z <- 1
while( z>= 1){
  z <- z+1
  cat(z, "\n")
}
  • A: the loop will never terminate based on its logical condition, as z\( \geq 1 \) for every iteration.

    • at best, we might hope that this will throw an error to the programmer when z becomes too large to store in memory.
  • This shows how while() loops will not always be appropriate.

  • You have to be particularly careful that you don't end up stuck in an infinite loop because your condition is always met and hence the while statement never terminates.

  • In this case we have an obvious error with this loop, but these conditions can be much more subtle and therefore we need to think about how they are satisfied.