# Activity 08/27/2021 ## STAT 445 / 645 -- Section 1001
Instructor: Colin Grudzien
## Instructions: We will work through the following series of activities as a group and hold small group work and discussions in Zoom Breakout Rooms. Follow the instructions in each sub-section when the instructor assigns you to a breakout room. ## Activities: ### Activity 1: a review of data types With your group, discuss each of the following questions: #### Question 1: What do you expect the result of the following to be? ```{r, eval=FALSE} 1 == TRUE ``` Can you explain in terms of types and type coercion why we get this answer? #### Question 2: Look at the help for the `c` function. What kind of vector do you expect you will create if you evaluate each of the following lines of code? ```{r, eval=FALSE} c(1, 2, 3) c('d', 'e', 'f') c(1, 2, 'f') ``` #### Question 3: Before we start the exercise, we should have some additional background. There are 5 main types in R: double; ```{r} typeof(pi) ``` integer; ```{r} typeof(1L) ``` complex; ```{r} typeof(1+1i) ``` and non-numeric types such as logical ```{r} typeof(TRUE) ``` and character ```{r} typeof('banana') ``` Understanding data types and the way they are coerced is important because R strictly enforces that data in a vector is of a single type. By keeping everything in a column the same, we can make simple assumptions about our data;if you can interpret one entry in the column as a number, then you can interpret all of them as numbers, so we don’t have to check every time. This consistency is one aspect of what people mean when they talk about clean data; in the long run, strict consistency goes a long way to making our lives easier in R.
• The coercion rules go:
1. logical -> integer
2. integer -> numeric
3. numeric -> complex
4. complex -> character,
• where -> can be read as are transformed into.
• You can try to force coercion against this flow using the as. functions, e.g.,
```{r} as.numeric(c('1', '2', '3')) ``` Consider now how we have seen that vector may coerce a logical value to numeric one: ```{r, eval=FALSE} c(TRUE, pi) ``` What do you expect the output of the following expression to be? ```{r, eval=FALSE} c(TRUE, 'TRUE') ``` Can you explain this in terms of the standard coercion rules? How can you coerce 'TRUE' back into a logical data type? #### Question 4: Is the following expression guaranteed to be "TRUE"? ```{r, eval=FALSE} 0 == sin(pi) ``` Discuss with your peers why this is the case. What is a better option for comparing double precision approximations in R? #### Question 5: Consider the following expressions --- what is the final output? What does this have to do with the slice operator ```:```? ```{r, eval=FALSE} x <- 9:25 y <- 3:4 x[y] ``` #### Question 6: Consider the following expressions --- what is the final output? Can you think of some instances where a logical index vector would be a valuable way to extract data? Give an example of another comparison. ```{r, eval=FALSE} x <- 9:25 y <- x > 16 x[y] ``` #### Question 7: We have seen that we can apply mathematical operations to numerical vectors. What do you expect the output to be of the following statement? Can you explain the output? ```{r, eval=FALSE} c('red', 'blue', 'green') + 2 ``` ### Debrief: discuss the questions We will discuss the questions from activity 1 as a class. ### Activity 2: A first look at dataframes Vectors are one type of data structure that we encounter frequently; however, there are other related (and important) data structures that we will need to manipulate. R has an extremely developed tool for handling tabular data, such as comma separated values -- this tool is a "dataframe". We will explore the functionality of dataframes with some "cat" data in some simple examples: ```{r eval=FALSE} cats <- data.frame(coat = c("calico", "black", "tabby"), weight = c(2.1, 5.0, 3.2), likes_string = c(1, 0, 1)) ``` Notice that the arguments of the function "data.frame()" are three expressions associating a name with a vector. Printing the variable "cats", we see what tabular data looks like in a dataframe: ```{r eval=FALSE} cats ``` The assignment of the vectors to names in the arguments assigned the column names. Each column consists of **a vector of uniform data type**. If we want to extract a named column from a dataframe, this can be done with the "\$" sign and the column name: ```{r eval=FALSE} cats\$weight ``` Each row, on the other hand, consists of multiple measurements (of different data types) corresponding to one specific case of the data set. We might suppose that the scale used to measure the cats' weights was uniformly off by two kgs in all measurements. #### Question 1: How can we can re-assign values into the column of weights? Think about how to make a recursive assignemnt. ```{r eval=FALSE} cats\$weight <- cats\$weight + 2 ``` * We can verify that the assignment went into the column for weight in "cats", ```{r eval=FALSE} cats ``` #### Question 2: A data structure related to vectors are **lists**. Lists function as containers for heterogeneous data, allowing different types. Consider the following lines of code, discuss with your peers the differences in the output from the ```list()``` function and the constructor function ```c()```. What do you think are some of the practical differences between lists and vectors? Can you identify the types of elements of the list? ```{r eval=TRUE} list_example <- list(1, "a", TRUE, 1+4i) list_example vector_example <- c(1, "a", TRUE, 1+4i) vector_example ``` #### Question 3: In the last example no type coercion has taken place; ```{r eval=FALSE} typeof(list_example[]) typeof(list_example[]) ``` all of the original types have been respected, but because the data is allowed to be inhomogeneous we can't use vector operations on a list. ```{r eval=FALSE, error=TRUE} list_example + 2 ``` Here we see an error message because the "+" operator only knows how to operate on numeric arguments, or ones that can be coerced into one. Recall our dataframe cats, ```{r eval=FALSE} cats ``` We know that dataframes contain homogeneous data in each column, but each row may be inhomogeneous. **Q:** What kind of data structure do you think a dataframe is? Can it be a vector? Why or why not? How can you test what type of data strcuture our ```cats``` dataframe is? ### Debrief: discuss the questions We will discuss the questions from activity 2 as a class.