Control flow in R


  • The following topics will be covered in this lecture:
    • Control flow
      • if()
      • for()
      • while()

Control flow

  • Often when we're coding we want to control the flow of our actions – this often occurs when we write a script with detailed instructions.

  • This can be done by setting actions to occur only if a condition or a set of conditions are met.

    • Alternatively, we can also set an action to occur a particular number of times.
  • There are several ways you can control flow in R.

    • How to use control flow in R will be the first topic we will outline in the following.

Basic control flow

  • For conditional statements, the most commonly used approaches are the constructs if and else:
# if
if (condition is true) {
  perform action

# if ... else
if (condition is true) {
  perform action
} else {  # that is, if the condition is false,
  perform alternative action
  • This kind of binary logic is at the heart of classical (non-quantum) computing and used effectively can be used to create rich sets of commands.

  • if and else can be chained together to handle a wide variety of cases, and even to handle when we encounter errors.

An example

  • Say, for example, that we want R to print a message if a variable x has a particular value:
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")

[1] 8
  • The print statement does not appear in the console because x is not greater than 10.

An example continued

  • To print a different message for numbers less than 10, we can add an else statement.
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else {
  print("x is less than 10")
[1] "x is less than 10"
  • You can also test multiple conditions by using else if.
x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else if (x > 5) {
  print("x is greater than 5, but less than 10")
} else {
  print("x is less than 5")
[1] "x is greater than 5, but less than 10"
  • If we needed to sort some data into sub-categories based on a TRUE or FALSE statements, if, else if and else will equip us to handle complex cases.

An example continued

  • Remember: when R evaluates the condition inside if() statements, it is looking for a logical element, i.e., TRUE or FALSE.
x  <-  4 == 3
if (x) {
  "4 equals 3"
} else {
  "4 does not equal 3"          
[1] "4 does not equal 3"
  • As we can see, the not equal message was printed because the vector x is FALSE
x <- 4 == 3

Advanced details on conditions

  • Note: the if() function only accepts singular (of length 1) inputs, and therefore returns an error when you use it with a standard vector.

    • The if() function will still run, but will only evaluate the condition in the first element of the vector.
  • To use the if() function, you need to make sure your input is singular (of length 1).

Advanced details on conditions

  • The in ifelse() function in R accepts both if() and else() statements simultaneously as structured in the previous example.

  • This function accepts both singular and vector inputs and is structured as follows:

# ifelse function 
ifelse(condition is true, perform action, perform alternative action) 
  • The first argument is the condition or a set of conditions to be met;

  • the second argument is the statement that is evaluated when the condition is TRUE;

  • and the third statement is the statement that is evaluated when the condition is FALSE.

An example

  • Consider the following example of the ifelse function;

  • Q: can you hypothesize what will be the output of this statement?

y <- -3
ifelse(y < 0, "y is a negative number", "y is either positive or zero")
[1] "y is a negative number"

Repeating operations

  • In many instances in data analysis, we will need to repeatedly perform some operation.

  • This could be as simple as repeatedly opening a long list of files, taking out a line of data that is needed from each, and compiling all the data into a dataframe.

  • More complex analysis also often requires complex instructions to be delivered to software in R.

  • If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for() loop will do the job.

  • However, for performance for() loops should be avoided unless the order of iteration is important:

    • i.e. the calculation at each iteration depends on the results of previous iterations.
  • If the order of iteration is not important, then vectorized alternatives, such as the purr package, should be used whenever possible.

An example of for loops

  • The basic structure of a for() loop is:
for (iterator in set of values) {
  do a thing
  • For example:
for (i in 1:10) {
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
  • Notice that the 1:10 bit creates a vector on the fly; you can iterate over any other vector as well.

Nested for loops

  • We can use a for() loop nested within another for() loop to iterate over two things at once.
for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
[1] "1 a"
[1] "1 b"
[1] "1 c"
[1] "1 d"
[1] "1 e"
[1] "2 a"
[1] "2 b"
[1] "2 c"
[1] "2 d"
[1] "2 e"
[1] "3 a"
[1] "3 b"
[1] "3 c"
[1] "3 d"
[1] "3 e"
[1] "4 a"
[1] "4 b"
[1] "4 c"
[1] "4 d"
[1] "4 e"
[1] "5 a"
[1] "5 b"
[1] "5 c"
[1] "5 d"
[1] "5 e"

Nested for loops

  • We notice in the output that when the first index (i) is set to 1, the second index (j) iterates through its full set of indices.

  • Once the indices of j have been iterated through, then i is incremented. This process continues until the last index has been used for each for() loop.

  • Rather than printing the results, we could write the loop output to a new object.

output_vector <- c()
for (i in 1:5) {
  for (j in c('a', 'b', 'c', 'd', 'e')) {
    temp_output <- paste(i, j)
    output_vector <- c(output_vector, temp_output)
  • Our output_vector from the last slide thus prints as
 [1] "1 a" "1 b" "1 c" "1 d" "1 e" "2 a" "2 b" "2 c" "2 d" "2 e" "3 a" "3 b"
[13] "3 c" "3 d" "3 e" "4 a" "4 b" "4 c" "4 d" "4 e" "5 a" "5 b" "5 c" "5 d"
[25] "5 e"

Nested for loops

  • The last approach can be useful, but 'growing your results' (building the result object incrementally) is computationally inefficient,

    • we should avoid this style of programming when iterating through a lot of values.
  • Computers are very bad at handling this efficiently, so your calculations can very quickly slow to a crawl.

  • It's much better to define an empty results object before hand of appropriate dimensions, rather than initializing an empty object without dimensions.

  • If you know the end result will be stored in a matrix like above, create an empty matrix with 5 row and 5 columns, then at each iteration store the results in the appropriate location.

Pre-allocation of memory

  • A better way is to define your (empty) output object before filling in the values.

  • For this example, it looks more involved, but is still more efficient.

output_matrix <- matrix(nrow=5, ncol=5)
j_vector <- c('a', 'b', 'c', 'd', 'e')
for (i in 1:5) {
  for (j in 1:5) {
    temp_j_value <- j_vector[j]
    temp_output <- paste(i, temp_j_value)
    output_matrix[i, j] <- temp_output
output_vector2 <- as.vector(output_matrix)
  • This new output prints as
 [1] "1 a" "2 a" "3 a" "4 a" "5 a" "1 b" "2 b" "3 b" "4 b" "5 b" "1 c" "2 c"
[13] "3 c" "4 c" "5 c" "1 d" "2 d" "3 d" "4 d" "5 d" "1 e" "2 e" "3 e" "4 e"
[25] "5 e"

While loops

  • Sometimes you will find yourself needing to repeat an operation as long as a certain condition is met, or until some condition is met, but may not be met until after an unknown number of iterations.

  • You can do this with a while() loop.

while(this condition is true){
  do a thing
  • R will interpret a condition being met as “TRUE”.

  • A loop can be written to perform an action for a “FALSE” condition, until it is becomes “TRUE” by using the logical inverse !.

while(! this condition is False){
  do a thing

While loops example

  • As an example, here's a while loop that generates random numbers from a uniform distribution (the runif() function) between 0 and 1 until it gets one that's less than 0.1.

  • We will set a random seed set for reproducibility of the analysis.

z <- 1
while(z > 0.1){
  z <- runif(1)
  cat(z, "\n")
  • Even though this is a while loop that terminates on a random condition (z>1), the seed means that each time we re-run the code we will get the same (pseudo)-random result.

While loops example

  • Now that we have learned a bit about while() loops, lets consider the following question

  • Q: what do you think will be the output of the below chunk of code?

z <- 1
while( z>= 1){
  z <- z+1
  cat(z, "\n")
  • A: the loop will never terminate based on its logical condition, as z\( \geq 1 \) for every iteration.

    • at best, we might hope that this will throw an error to the programmer when z becomes too large to store in memory.
  • This shows how while() loops will not always be appropriate.

  • You have to be particularly careful that you don't end up stuck in an infinite loop because your condition is always met and hence the while statement never terminates.

  • In this case we have an obvious error with this loop, but these conditions can be much more subtle and therefore we need to think about how they are satisfied.