Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

The following topics will be covered in this lecture:

- More on subsetting data
- Dataframes
- Vectorization

Now that we've explored the different ways to subset vectors, how do we subset the other data structures?

Factor subsetting works the same way as vector subsetting.

```
f <- factor(c("a", "a", "b", "c", "c", "d"))
f[f == "a"]
```

```
[1] a a
Levels: a b c d
```

```
f[f %in% c("b", "c")]
```

```
[1] b c c
Levels: a b c d
```

```
f[1:3]
```

```
[1] a a b
Levels: a b c d
```

- Skipping elements will not remove the level even if no more of that category exists in the factor:

```
f[-3]
```

```
[1] a a c c d
Levels: a b c d
```

- Matrices are also subsetted using the
`[`

function. In this case it takes two arguments: the first applying to the rows, the second to its columns:

```
set.seed(1)
m <- matrix(rnorm(6*4), ncol=4, nrow=6)
m[3:4, c(3,1)]
```

```
[,1] [,2]
[1,] 1.12493092 -0.8356286
[2,] -0.04493361 1.5952808
```

- You can leave the first or second arguments blank to retrieve all the rows or columns respectively:

```
m[, c(3,4)]
```

```
[,1] [,2]
[1,] -0.62124058 0.82122120
[2,] -2.21469989 0.59390132
[3,] 1.12493092 0.91897737
[4,] -0.04493361 0.78213630
[5,] -0.01619026 0.07456498
[6,] 0.94383621 -1.98935170
```

- If we only access one row or column, R will automatically convert the result to a vector:

```
m[3,]
```

```
[1] -0.8356286 0.5757814 1.1249309 0.9189774
```

- If you want to keep the output as a matrix, you need to specify a
*third*argument;`drop = FALSE`

:

```
m[3, , drop=FALSE]
```

```
[,1] [,2] [,3] [,4]
[1,] -0.8356286 0.5757814 1.124931 0.9189774
```

- Unlike vectors, if we try to access a row or column outside of the matrix, R will throw an error:

```
m[, c(3,6)]
```

```
Error in m[, c(3, 6)]: subscript out of bounds
```

- When dealing with multi-dimensional arrays, each argument to
`[`

corresponds to a dimension. For example, a 3D array, the first three arguments correspond to the rows, columns, and depth dimension.

- Because matrices are vectors, we can also subset using only one argument:

```
m[5]
```

```
[1] 0.3295078
```

This usually isn't useful, and often confusing to read. However it is useful to note that matrices are laid out in

*column-major format*by default.That is the elements of the vector are arranged column-wise:

```
matrix(1:6, nrow=2, ncol=3)
```

```
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
```

- If you wish to populate the matrix by row, use
`byrow=TRUE`

:

```
matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
```

```
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
```

- Matrices can also be subsetted using their rownames and column names instead of their row and column indices.

Now we'll introduce some new subsetting operators. There are three functions used to subset lists. We've already seen these when learning about atomic vectors and matrices:

`[`

,`[[`

, and`$`

.Using

`[`

will always return a list. If you want to*subset*a list, but not*extract*an element, then you will likely use`[`

.

```
xlist <- list(a = "Software Carpentry", b = 1:10, data = head(iris))
xlist[1]
```

```
$a
[1] "Software Carpentry"
```

This returns a

*list with one element*.To extract individual elements of a list, you need to use the double-square bracket function:

`[[`

.

```
xlist[[1]]
```

```
[1] "Software Carpentry"
```

- You can't extract more than one element at once:

```
xlist[[1:2]]
```

```
Error in xlist[[1:2]]: subscript out of bounds
```

- Nor use it to skip elements:

```
xlist[[-1]]
```

```
Error in xlist[[-1]]: invalid negative subscript in get1index <real>
```

- But you can use names to both subset and extract elements:

```
xlist[["a"]]
```

```
[1] "Software Carpentry"
```

- The
`$`

function is a shorthand way for extracting elements by name:

```
xlist$data
```

```
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
```

Remember the data frames are lists underneath the hood, so similar rules apply.

However they are also two dimensional objects:

`[`

with one argument will act the same way as for lists, where each list element corresponds to a column.- The resulting object will be a data frame:

```
require(gapminder)
head(gapminder[3])
```

```
# A tibble: 6 x 1
year
<int>
1 1952
2 1957
3 1962
4 1967
5 1972
6 1977
```

- Similarly,
`[[`

will act to extract*a single column*:

```
head(gapminder[["lifeExp"]])
```

```
[1] 28.801 30.332 31.997 34.020 36.088 38.438
```

- The
`$`

symbol provides a convenient shorthand to extract columns by name:

```
head(gapminder$year)
```

```
[1] 1952 1957 1962 1967 1972 1977
```

- With two arguments,
`[`

behaves the same way as for matrices:

```
gapminder[1:3,]
```

```
# A tibble: 3 x 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
```

- If we subset a single row, the result will be a data frame (because the elements are mixed types):

```
gapminder[3,]
```

```
# A tibble: 1 x 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1962 32.0 10267083 853.
```

- But for a single column the result will be a vector (this can be changed with the third argument,
`drop = FALSE`

).

Most of R's functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time.

This makes writing code more concise, easy to read, and less error prone.

```
x <- 1:4
x * 2
```

```
[1] 2 4 6 8
```

- The multiplication happened to each element of the vector.

- We can also add two vectors together:

```
y <- 6:9
x + y
```

```
[1] 7 9 11 13
```

- Each element of
`x`

was added to its corresponding element of`y`

:

```
x: 1 2 3 4
+ + + +
y: 6 7 8 9
---------------
7 9 11 13
```

- Comparison operators, logical operators, and many functions are also vectorized:

```
x > 2
```

```
[1] FALSE FALSE TRUE TRUE
```

```
a <- x > 3 # or, for clarity, a <- (x > 3)
a
```

```
[1] FALSE FALSE FALSE TRUE
```

R, while having the benefit of being an easy-to-learn language with powerful software, is not especially fast.

For many users, this doesn't pose an obstacle however as the vectorization of the language can actually make many computations in R competitive.

- When a mathematic operation or function is run as a vectorized operation, the computer calls underlying C code that has been optimized for performance.
- This is the same performance-gain technique that is used in, e.g., MATLAB and Python.

Though we have not discussed

`FOR`

loops yet, we will mention now that in general you should always try to write your operations vector-wise instead of with`FOR`

loops in R.

**Very important:**the operator`*`

gives you element-wise multiplication!- To do matrix multiplication, we need to use the
`%*%`

operator:

```
m %*% matrix(1, nrow=4, ncol=1)
```

```
[,1]
[1,] 0.06095586
[2,] -0.69883054
[3,] 1.78406103
[4,] 2.02709511
[5,] 1.89966366
[6,] -1.47614063
```

```
matrix(1:4, nrow=1) %*% matrix(1:4, ncol=1)
```

```
[,1]
[1,] 30
```

- For more on matrix algebra, see the Quick-R reference guide