Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
The following topics will be covered in this lecture:
Now that we've explored the different ways to subset vectors, how do we subset the other data structures?
Factor subsetting works the same way as vector subsetting.
f <- factor(c("a", "a", "b", "c", "c", "d"))
f[f == "a"]
[1] a a
Levels: a b c d
f[f %in% c("b", "c")]
[1] b c c
Levels: a b c d
f[1:3]
[1] a a b
Levels: a b c d
f[-3]
[1] a a c c d
Levels: a b c d
[
function. In this case it takes two arguments: the first applying to the rows, the second to its columns:set.seed(1)
m <- matrix(rnorm(6*4), ncol=4, nrow=6)
m[3:4, c(3,1)]
[,1] [,2]
[1,] 1.12493092 -0.8356286
[2,] -0.04493361 1.5952808
m[, c(3,4)]
[,1] [,2]
[1,] -0.62124058 0.82122120
[2,] -2.21469989 0.59390132
[3,] 1.12493092 0.91897737
[4,] -0.04493361 0.78213630
[5,] -0.01619026 0.07456498
[6,] 0.94383621 -1.98935170
m[3,]
[1] -0.8356286 0.5757814 1.1249309 0.9189774
drop = FALSE
:m[3, , drop=FALSE]
[,1] [,2] [,3] [,4]
[1,] -0.8356286 0.5757814 1.124931 0.9189774
m[, c(3,6)]
Error in m[, c(3, 6)]: subscript out of bounds
[
corresponds to a dimension. For example, a 3D array, the first three arguments correspond to the rows, columns, and depth dimension.m[5]
[1] 0.3295078
This usually isn't useful, and often confusing to read. However it is useful to note that matrices are laid out in column-major format by default.
That is the elements of the vector are arranged column-wise:
matrix(1:6, nrow=2, ncol=3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
byrow=TRUE
:matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Now we'll introduce some new subsetting operators. There are three functions used to subset lists. We've already seen these when learning about atomic vectors and matrices: [
, [[
, and $
.
Using [
will always return a list. If you want to subset a list, but not extract an element, then you will likely use [
.
xlist <- list(a = "Software Carpentry", b = 1:10, data = head(iris))
xlist[1]
$a
[1] "Software Carpentry"
This returns a list with one element.
To extract individual elements of a list, you need to use the double-square bracket function: [[
.
xlist[[1]]
[1] "Software Carpentry"
xlist[[1:2]]
Error in xlist[[1:2]]: subscript out of bounds
xlist[[-1]]
Error in xlist[[-1]]: invalid negative subscript in get1index <real>
xlist[["a"]]
[1] "Software Carpentry"
$
function is a shorthand way for extracting elements by name:xlist$data
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Remember the data frames are lists underneath the hood, so similar rules apply.
However they are also two dimensional objects:
[
with one argument will act the same way as for lists, where each list element corresponds to a column. require(gapminder)
head(gapminder[3])
# A tibble: 6 x 1
year
<int>
1 1952
2 1957
3 1962
4 1967
5 1972
6 1977
[[
will act to extract a single column:head(gapminder[["lifeExp"]])
[1] 28.801 30.332 31.997 34.020 36.088 38.438
$
symbol provides a convenient shorthand to extract columns by name:head(gapminder$year)
[1] 1952 1957 1962 1967 1972 1977
[
behaves the same way as for matrices:gapminder[1:3,]
# A tibble: 3 x 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
gapminder[3,]
# A tibble: 1 x 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1962 32.0 10267083 853.
drop = FALSE
).Most of R's functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time.
This makes writing code more concise, easy to read, and less error prone.
x <- 1:4
x * 2
[1] 2 4 6 8
y <- 6:9
x + y
[1] 7 9 11 13
x
was added to its corresponding element of y
:x: 1 2 3 4
+ + + +
y: 6 7 8 9
---------------
7 9 11 13
x > 2
[1] FALSE FALSE TRUE TRUE
a <- x > 3 # or, for clarity, a <- (x > 3)
a
[1] FALSE FALSE FALSE TRUE
R, while having the benefit of being an easy-to-learn language with powerful software, is not especially fast.
For many users, this doesn't pose an obstacle however as the vectorization of the language can actually make many computations in R competitive.
Though we have not discussed FOR
loops yet, we will mention now that in general you should always try to write your operations vector-wise instead of with FOR
loops in R.
*
gives you element-wise multiplication!%*%
operator:m %*% matrix(1, nrow=4, ncol=1)
[,1]
[1,] 0.06095586
[2,] -0.69883054
[3,] 1.78406103
[4,] 2.02709511
[5,] 1.89966366
[6,] -1.47614063
matrix(1:4, nrow=1) %*% matrix(1:4, ncol=1)
[,1]
[1,] 30