Instructor: Colin Grudzien

## Instructions: We will work through the following series of activities as a group and hold small group work and discussions in Zoom Breakout Rooms. Follow the instructions in each sub-section when the instructor assigns you to a breakout room. ## Activities: ```{r} require("faraway") ``` ### Activity 1: #### Question 1: We will construct the following model on the `fat` data: ```{r} lmod <- lm(brozek ~ age + weight + height + neck + chest + abdom + hip + thigh + knee + ankle + biceps + forearm + wrist, data=fat) summary(lmod) ``` We want to use the `plot` funtion on the model as well as our hypothesis testing to examine the hypotheses of Gaussian error distributions, non-constant variances and nonlinearity in the structure. Using the plot function, furthermore, what do you notice about the leverages and influence of various observations? Does it appear that there is an erroneous piece of data of high leverage? Why? #### Question 2: Create a copy of the `fat` data with the obviously erroneous data removed. Refit the model and repeat the diagnostics. What do you notice about the flagged value for the largest Cook's distance in the influence plot? #### Question 3: We may consider that this very extreme case falls outside of the scope of our model, as we may be interested in more typical cases in the population when estimating the percent body fat. We will remove the very extreme case above and examine once again the cases that seem unrealistic. Recall the meaning of the response variable -- is there anything concerning about the range of the values? Remove the problematic case and repeat the diagnostics. #### Question 4: Now systematically reduce the model into one in which all variables are significant. Repeat the diagnostics. #### Question 5: Use the `dfbeta` function to plot the changes of the parameter estimates with respect to excluding a single case -- do these parameter estimates evidence strong sensitivity to a single observation? #### Question 6: Compute the sutdentized residuals and perform the hypothesis test for outliers using the t-test with each of: 1. the usual $\alpha=5\%$ significance level for each test; and 2. the Bonferroni corrected individual signficance level for each test. What do you notice about the difference between the results?