10/21/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
We will now begin our discussion on diagnosing issues with our error covariance assumptions.
If we wish to check the assumptions on the error or variation in the signal \( \boldsymbol{\epsilon} \), we need to consider, \( \boldsymbol{\epsilon} \) itself is not observable.
Q: What proxy could we consider for the error?
Recall, the definition of \( \hat{\mathbf{Y}} \)
\[ \begin{align} \hat{\mathbf{Y}} \triangleq& \mathbf{X}\left(\mathbf{X}^\mathrm{T} \mathbf{X}\right)^{-1} \mathbf{X}^\mathrm{T} \mathbf{Y} \\ & =\mathbf{H} \mathbf{Y} \end{align} \]
Therefore, if we compute the residuals,
\[ \begin{align} \hat{\boldsymbol{\epsilon}} & = \mathbf{Y} - \hat{\mathbf{Y}} \\ & =\left(\mathbf{I} - \mathbf{H}\right)\mathbf{Y} \\ & =\left(\mathbf{I} - \mathbf{H}\right)\mathbf{X} \boldsymbol{\beta} + \left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon} \end{align} \]
From the last slide
\[ \begin{align} \hat{\boldsymbol{\epsilon}} & =\left(\mathbf{I} - \mathbf{H}\right)\mathbf{X} \boldsymbol{\beta} + \left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon} \end{align} \]
Q: recalling the definition of \( \mathbf{H} = \mathbf{X}\left(\mathbf{X}^\mathrm{T} \mathbf{X}\right)^{-1} \mathbf{X}^\mathrm{T} \), what does \( \mathbf{H}\mathbf{X} \) equal to?
A: \( \mathbf{H} \) is the projection operator onto the span of the design matrix, and thus \( \mathbf{H}\mathbf{X} = \mathbf{X} \) by construction.
Q: given the above, what does \( \left(\mathbf{I} - \mathbf{H}\right)\mathbf{X} \) equal to?
A: the above must equal zero, as \( \mathbf{I}\mathbf{X} = \mathbf{H}\mathbf{X} = \mathbf{X} \).
From the previous two exercises, we can deduce,
\[ \begin{align} \hat{\boldsymbol{\epsilon}} = \left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon} \end{align} \]
We take the assumption that \( \boldsymbol{\epsilon}\sim N(0, \mathbf{I} \sigma^2) \),
\[ \begin{align} \mathbb{E}[\hat{\boldsymbol{\epsilon}}] &= \mathbb{E}\left[\left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon}\right] \\ &=\left(\mathbf{I} - \mathbf{H}\right)\mathbb{E}\left[\boldsymbol{\epsilon}\right]\\ &= 0 \end{align} \]
\[ \begin{align} \mathbb{E}\left[\left(\hat{\boldsymbol{\epsilon}}\right) \left(\hat{\boldsymbol{\epsilon}}\right)^\mathrm{T}\right] &= \mathbb{E}\left[\left(\left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon}\right)\left(\left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon}\right)^\mathrm{T}\right] \\ &=\mathbb{E}\left[\left(\mathbf{I} - \mathbf{H}\right)\boldsymbol{\epsilon}\boldsymbol{\epsilon}^\mathrm{T}\left(\mathbf{I} - \mathbf{H}\right)^\mathrm{T}\right]\\ &=\left(\mathbf{I} - \mathbf{H}\right)\mathbb{E}\left[\boldsymbol{\epsilon}\boldsymbol{\epsilon}^\mathrm{T}\right]\left(\mathbf{I} - \mathbf{H}\right)^\mathrm{T}\\ & =\left(\mathbf{I} - \mathbf{H}\right)\left[\sigma^2 \mathbf{I}\right]\left(\mathbf{I} - \mathbf{H}\right)\\ &=\sigma^2\left(\mathbf{I} - \mathbf{H}\right) \end{align} \]
Q: Why is the last slide relevant? Particularly, why should we be concerned with the covariance of the residuals?
Note, even though the errors are assumed to be independent and of equal variance \( \sigma^2 \), the same doesn't hold for the residuals.
Nonetheless, we use the residuals to underrstand how the true errors are behaving, which are unobservable.
Courtesy of: Faraway, J. Linear Models with R. 2nd Edition
Courtesy of: Faraway, J. Linear Models with R. 2nd Edition
Courtesy of: Faraway, J. Linear Models with R. 2nd Edition
Courtesy of: Faraway, J. Linear Models with R. 2nd Edition
library("faraway")
lmod_savings <- lm(sr ~ pop15+pop75+dpi+ddpi,savings)
sumary(lmod_savings)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.56608654 7.35451611 3.8842 0.0003338
pop15 -0.46119315 0.14464222 -3.1885 0.0026030
pop75 -1.69149768 1.08359893 -1.5610 0.1255298
dpi -0.00033690 0.00093111 -0.3618 0.7191732
ddpi 0.40969493 0.19619713 2.0882 0.0424711
n = 50, p = 5, Residual SE = 3.80267, R-Squared = 0.34
We saw in the last lecture that the residuals do not diverge strongly from the Gaussian assumption, but some of the predictors themselves appear to have skew or multimodal distributions.
We will be interested now in examining our error covariance assumptions.
par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
plot(fitted(lmod_savings),residuals(lmod_savings),xlab="Fitted",ylab="Residuals", cex=3, cex.lab=3, cex.axis=3)
abline(h=0)
par(mai=c(1.5,1.5,.5,.5), mgp=c(3,0,0))
plot(fitted(lmod_savings),sqrt(abs(residuals(lmod_savings))), xlab="Fitted",ylab=expression(sqrt(hat(epsilon))), cex=3, cex.lab=3, cex.axis=1.5)