10/07/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
As a general method, we can always use the F-statistic for nested models.
Specifically, whenever one model is given by a subspace of another:
Concretely, the null hypothesis must be \( H_0 : \boldsymbol{\beta}_i = \boldsymbol{0} \) for each \( i=q,\cdots, p-1 \).
The alternative hypothesis is that the larger model holds,
\[ H_1: \boldsymbol{\beta} \neq 0 \]
We compute the F statistic as, \[ \begin{align} F &\triangleq \frac{ \left( RSS_\boldsymbol{\omega} - RSS_\boldsymbol{\Omega}\right)/ (p-1)}{RSS_\boldsymbol{\Omega}/(n-p)} . \end{align} \]
Suppose there is one particular variable that we want to determine the significance of for our model.
Specifically, suppose we have a model,
\[ \begin{align} \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \end{align} \] with respect to some choice of variables \( \mathbf{X} \).
Our alternative hypothesis in this case is,
\[ \begin{align} H_1: \boldsymbol{\beta} \neq \boldsymbol{0}. \end{align} \]
Q: if we want to determine if \( \boldsymbol{\beta}_i \) specifically gives an appreciable difference in this model, what is our null hypothesis?
A: our null hypothesis takes the form,
\[ \begin{align} H_0: \boldsymbol{\beta}_i = 0 \end{align} \]
We will examine this on the gala
data once again.
We define the model lmods
without area
as an explanatory variable for the null hypothesis. Then we compute the ANOVA table with the bigger model that contains area
require("faraway")
lmod <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, gala)
lmods <- lm(Species ~ Elevation + Nearest + Scruz + Adjacent, gala)
anova(lmods, lmod)
Analysis of Variance Table
Model 1: Species ~ Elevation + Nearest + Scruz + Adjacent
Model 2: Species ~ Area + Elevation + Nearest + Scruz + Adjacent
Res.Df RSS Df Sum of Sq F Pr(>F)
1 25 93469
2 24 89231 1 4237.7 1.1398 0.2963
The result of the \( F \) test is to say, “With probability 29.63%, we will find a value drawn from the F distribution with this value or greater”.
Q: do we reject or fail to reject the null hypothesis at \( 5\% \) significance here?
A: Here we fail to reject the null hypothesis because it is reasonable that the difference between the large model and the small model could be due to random variation.
Solution: we can consider the two events, \[ \begin{align} A: X\geq L & & B: X \leq -L. \end{align} \]
Due to the symmetry of the student t-distribution about zero, we find,
\[ \begin{align} P(A)=P(B) = \frac{\alpha}{2}. \end{align} \]
Then, notice that,
\[ \begin{align} P(A\cup B) &= P(A) + P(B) - P(A\cap B) \\ &= \frac{\alpha}{2} + \frac{\alpha}{2} - 0 \end{align} \] as the intersection is empty.
Finally, the complement of the event \( A\cup B \) is the event that \( -L < X < L \), such that, \( P(-L < X < L) = 1 - \alpha \).
Suppose again, we have \( n \) independent random varaibles drawn from a Gaussian distribution \( \left\{Y_i\right\}_{i=1}^n \), with unknown true mean \( \mu_Y \) and standard deviation \( \sigma \).
Q: if we want to test the hypothesis that \( \mu_Y \neq 0 \), what are the null and alternative hypotheses?
A:
\[ \begin{align} H_0 &: \mu_Y = 0 \\ H_1 &: \mu_Y \neq 0 \end{align} \]
To test the hypothesis, we assume the null hypothesis and refer to the quantity, \[ \begin{align} t^\ast = \frac{\overline{Y} - \mu_Y}{S/ \sqrt{n}} = \frac{\overline{Y}}{S / \sqrt{n}} \end{align} \] with respect to the null.
The value \( t^\ast \sim t_{n-1} \) so that we can identify the value \( t^\alpha \) for which
\[ P(t \geq t^\alpha) = \frac{\alpha}{2}. \]
By symmetry, we see that
\[ \begin{align} P( \vert t \vert \geq \vert t^\alpha\vert) = \alpha \end{align} \]
If we want thus to test the hypothesis that a single parameter \( \boldsymbol{\beta}_i \neq 0 \), we can follow a similar procedure under the Gaussian assumption.
Recall, the standard error of a given parameter \( \hat{\boldsymbol{\beta}}_i \) is given \[ \begin{align} se(\hat{\boldsymbol{\beta}}_{i-1}) \triangleq \hat{\sigma}\sqrt{(\mathbf{X}^\mathrm{T}\mathbf{X})^{-1}_{ii}} \end{align} \] where \( \hat{\sigma}^2 = \frac{RSS}{n-p} \).
The value \[ \begin{align} t_i = \frac{\hat{\boldsymbol{\beta}}_i}{se(\hat{\boldsymbol{\beta}}_i)} \end{align} \] has t-distribution in \( (n-p) \) degrees of freedom under the null hypothesis \( H_0:\boldsymbol{\beta}_i=0 \).
Particularly, we will determine the probability of obtaining a random value \( t \) where \( \vert t \vert \geq \vert t_i \vert \) with respect to the t-distribution to determine significance.
summary(lmod)
Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent,
data = gala)
Residuals:
Min 1Q Median 3Q Max
-111.679 -34.898 -7.862 33.460 182.584
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.068221 19.154198 0.369 0.715351
Area -0.023938 0.022422 -1.068 0.296318
Elevation 0.319465 0.053663 5.953 3.82e-06 ***
Nearest 0.009144 1.054136 0.009 0.993151
Scruz -0.240524 0.215402 -1.117 0.275208
Adjacent -0.074805 0.017700 -4.226 0.000297 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 60.98 on 24 degrees of freedom
Multiple R-squared: 0.7658, Adjusted R-squared: 0.7171
F-statistic: 15.7 on 5 and 24 DF, p-value: 6.838e-07
summary(lm(Species ~ Area, gala))
Call:
lm(formula = Species ~ Area, data = gala)
Residuals:
Min 1Q Median 3Q Max
-99.495 -53.431 -29.045 3.423 306.137
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 63.78286 17.52442 3.640 0.001094 **
Area 0.08196 0.01971 4.158 0.000275 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 91.73 on 28 degrees of freedom
Multiple R-squared: 0.3817, Adjusted R-squared: 0.3596
F-statistic: 17.29 on 1 and 28 DF, p-value: 0.0002748
area
?
area
is the same null hypothesis as the data is expressed only as random variation around the mean \( \overline{\mathbf{Y}} \).Area
is not the same here as it was on the last slide, because this is framed with respect to a different alternative hypothesis.We now have two methods for comparing different models: the F-test and the t-test.
The F-test is defined to compare the likelihood of any two models, as long as one lies in a subspace of another model.
The t-test is defined for a single parameter, rather than a combination of parameters like the F-test.
In particular, the two tests are equivalent when we test the two models
Suppose we wish to determine if the two variables area
and adjacent
have an effect on the response relative to the model with all other variables.
Particularly, we obtain the null hypothesis \( H_0 : \boldsymbol{\beta}_{area}= \boldsymbol{\beta}_{adjacent}=0 \).
lmods <- lm(Species ~ Elevation + Nearest + Scruz, gala)
anova(lmods, lmod)
Analysis of Variance Table
Model 1: Species ~ Elevation + Nearest + Scruz
Model 2: Species ~ Area + Elevation + Nearest + Scruz + Adjacent
Res.Df RSS Df Sum of Sq F Pr(>F)
1 26 158292
2 24 89231 2 69060 9.2874 0.00103 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The probability of drawing such an F value is around 1/1000, so we reject the null hypothesis.
This tells us that it is extremely unlikely that there is no combination of the variables area
and adjacent
which have some effect on the response, when comparing the two models.
sumary(lmod)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.068221 19.154198 0.3690 0.7153508
Area -0.023938 0.022422 -1.0676 0.2963180
Elevation 0.319465 0.053663 5.9532 3.823e-06
Nearest 0.009144 1.054136 0.0087 0.9931506
Scruz -0.240524 0.215402 -1.1166 0.2752082
Adjacent -0.074805 0.017700 -4.2262 0.0002971
n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77
There is no simple way of combining the above information to test a pair of explanatory variables simultaneously.
Testing a pair of variables needs to be performed with the F-test over the two nested models.
Suppose we think we can make a simpler model by combining two variables as some kind of linear combination,
Concretely, suppose we wish to take a new variable corresponding to the sum of the area of the island itself and the area of the adjacent island.
Our null hypothesis will be that \( \boldsymbol{\beta}_{area} = \boldsymbol{\beta}_{adjacent} \)
lmod <- lm(Species ~ Area + Adjacent + Elevation + Nearest + Scruz, gala)
lmods <- lm(Species ~ I(Area+Adjacent) + Elevation + Nearest + Scruz, gala)
anova(lmods,lmod)
Analysis of Variance Table
Model 1: Species ~ I(Area + Adjacent) + Elevation + Nearest + Scruz
Model 2: Species ~ Area + Adjacent + Elevation + Nearest + Scruz
Res.Df RSS Df Sum of Sq F Pr(>F)
1 25 109591
2 24 89231 1 20360 5.476 0.02793 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the above, notice the only difference is the I()
function, which enforces that the “+” expression isn't read as a formula.
Q: do we reject or fail to reject the null hypothesis?
A:In this case, we can reject the null hypothesis at the \( \alpha = 5\% \) significance level.
We have learned so far how to test the significance of a given parameter \( \boldsymbol{\beta}_i=0 \), which would reduce the model space \( \boldsymbol{\Omega} \) to the subspace \( \boldsymbol{\omega} \) where \( X_i \) has no effect on the response.
We may quite similarly test the significance of a parameter having a specific value (other than zero);
Consider the null hypothesis \( H_0: \boldsymbol{\beta}_{Elevation} = 0.5 \), versus the alternative hypothesis where we consider the space of models over
Species ~ Area + Adjacent + Elevation + Nearest + Scruz.
offset
to fix the associated value \( \boldsymbol{\beta}_{Elevation} = 0.5 \)lmods <- lm(Species ~ Area+ offset(0.5*Elevation) + Nearest + Scruz + Adjacent, gala)
anova(lmods, lmod)
Analysis of Variance Table
Model 1: Species ~ Area + offset(0.5 * Elevation) + Nearest + Scruz +
Adjacent
Model 2: Species ~ Area + Adjacent + Elevation + Nearest + Scruz
Res.Df RSS Df Sum of Sq F Pr(>F)
1 25 131312
2 24 89231 1 42081 11.318 0.002574 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Q: do we reject or fail to rejec the null hypothesis?
A: we reject the null hypothesis at \( 5\% \) significance.
Note: specifying a particular value for the parameter can also be seen in terms of specifying a non-zero mean for the parameter \( \hat{\boldsymbol{\beta}}_i \) in a t-test.
Suppose we want to test the following null hypothesis, \[ H_0: \boldsymbol{\beta}_{Elevation} = C \] for some constant \( C \) as discussed above.
If \( \hat{\boldsymbol{\beta}}_i \) is the solution by least squares, we can compute
\[ \begin{align}
t_i = \frac{\left(\hat{\boldsymbol{\beta}}_i - C\right)}{se(\boldsymbol{\beta})}
\end{align} \]
By the same reasoning as before, this is t-distributed in \( n-p \) degrees of freedom with the assumption under the null hypothesis that
\[ \mathbb{E}\left[\hat{\boldsymbol{\beta}}_i\right] = C \]
We return now to the idea of confidence intervals as an alternative (and dual) idea to hypothesis testing.
Recall that under our Gaussian assumptions,
\[ \boldsymbol{\epsilon} \sim N\left(\boldsymbol{0}, \sigma^2 \mathbf{I} \right), \]
we have derived,
\[ \hat{\boldsymbol{\beta}} \sim N\left(\boldsymbol{\beta}, \left(\mathbf{X}^\mathrm{T}\mathbf{X}\right)^{-1} \sigma^2 \right). \]
In this context, we have an unbiased, sample based estimate for the true mean \( \boldsymbol{\beta} \), with a known variance and \( n-p \) degrees of freedom.
We recall from the t-test that the value
\[ \begin{align} t_i = \frac{\hat{\boldsymbol{\beta}}_i - \boldsymbol{\beta}_i}{se(\hat{\boldsymbol{\beta}}_i)} \end{align} \]
is t-distributed in \( n-p \) degrees of freedom.
Suppose that \( t^\frac{\alpha}{2} \) is chosen such that \( P(X\geq t^\frac{\alpha}{2}_{n-p}) = \alpha/2 \);
Using this dual notion to the hypothesis test, we can create an interval centered at \( \hat{\boldsymbol{\beta}} \) (using the t-distribution) with some measure of confidence that the true \( \boldsymbol{\beta} \) lives within it.
Let us say (similarly to hypothesis testing) we wish to guarantee \( \alpha=5\% \) significance.
Our confidence interval will take the form of
\[ \begin{align} \left( \hat{\boldsymbol{\beta}}_i - t_{n-p}^{^\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i), \hat{\boldsymbol{\beta}}_i + t_{n-p}^{^\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i)\right) \end{align} \]
Where we define \( \large{t_{n-p}^{\frac{\alpha}{2}} } \) to be the critical value for which,
The symmetry of the student t-distribution guarantees that, with \( 100(1-\alpha)\% \) confidence, the true value
\[ \begin{align} \boldsymbol{\beta}_i \in \left( \hat{\boldsymbol{\beta}}_i - t_{n-p}^{\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i), \hat{\boldsymbol{\beta}}_i + t_{n-p}^{\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i)\right) \end{align} \]
lmod <- lm(Species ~ Elevation + Nearest + Scruz + Area + Adjacent, gala)
sumary(lmod)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.068221 19.154198 0.3690 0.7153508
Elevation 0.319465 0.053663 5.9532 3.823e-06
Nearest 0.009144 1.054136 0.0087 0.9931506
Scruz -0.240524 0.215402 -1.1166 0.2752082
Area -0.023938 0.022422 -1.0676 0.2963180
Adjacent -0.074805 0.017700 -4.2262 0.0002971
n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77
We note that the standard error for area
is given as \( \approx 0.022422 \),
We compute the \( 2.5\% \) critical value for \( T_{30 - 6} \) and obtain the confidence interval as,
t_crit <- qt(0.975, 30-6)
-0.02394 + c(-1,1) * t_crit * 0.02242
[1] -0.07021261 0.02233261
-0.02394 + c(-1,1) * t_crit * 0.02242
[1] -0.07021261 0.02233261
area
sumary(lmod)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.068221 19.154198 0.3690 0.7153508
Elevation 0.319465 0.053663 5.9532 3.823e-06
Nearest 0.009144 1.054136 0.0087 0.9931506
Scruz -0.240524 0.215402 -1.1166 0.2752082
Area -0.023938 0.022422 -1.0676 0.2963180
Adjacent -0.074805 0.017700 -4.2262 0.0002971
n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77
Similarly, we can compute the \( 95\% \) confidence interval for \( \boldsymbol{\beta}_{Adjacent} \) with,
-0.07480 + c(-1,1) * t_crit * 0.01770
[1] -0.111331 -0.038269
Q: Based on the above confidence interval, can we reject the null hypothesis \( H_0: \boldsymbol{\beta}_{Adjacent}=0 \) with \( 5\% \) significance?
A: Yes.
(-0.038269 + 0.111331)/abs(-0.07480)
[1] 0.9767647
Confidence intervals, while dual to hypothesis test, provide slightly more information than the p-values.
Particularly, we can examine a range of equally plausible values for the parameter of interest.
confint(lmod)
2.5 % 97.5 %
(Intercept) -32.4641006 46.60054205
Elevation 0.2087102 0.43021935
Nearest -2.1664857 2.18477363
Scruz -0.6850926 0.20404416
Area -0.0702158 0.02233912
Adjacent -0.1113362 -0.03827344
confint(lmod, level = 0.99)
0.5 % 99.5 %
(Intercept) -46.5049119 60.64135329
Elevation 0.1693731 0.46955638
Nearest -2.9392105 2.95749844
Scruz -0.8429913 0.36194283
Area -0.0866523 0.03877562
Adjacent -0.1243112 -0.02529848
Q: why is the \( 99\% \) confidence interval wider than the \( 95\% \) confidence interval?
A: to be more confident in the location of the true value, we need to widen the interval correspondingly.
The previous example only considered the univariate confidence interval, analogous to the t-test for a single parameter, i.e.,
Suppose we want (analogous to the F-test) to find a multivariate confidence region.
For \( \alpha \) the significance level, we find the \( 100(1-\alpha)\% \) confidence region for \( \hat{\boldsymbol{\beta}} \) is given by the relationship
\[ \begin{align} \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\mathrm{T}\left(\mathbf{X}^\mathrm{T}\mathbf{X} \right) \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \leq p \hat{\sigma}^2 F^{(\alpha)}_{p, n-p} \end{align} \]
The above can be interpreted as a weighted inner product of \( \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \) with itself with respect to its estimated inverse covaraince, i.e., a measure of distance squared, weighted by \( \frac{1}{\hat{\sigma}^2}\left(\mathbf{X}^\mathrm{T}\mathbf{X} \right) \).
We say that with \( 100(1-\alpha)\% \) confidence the true parameter \( \boldsymbol{\beta} \) will lie within the domain of the weighted distance function, for which the output is less than \[ p \hat{\sigma}^2 F^{(\alpha)}_{p, n-p}. \]
install.packages("ellipse"))
area
and adjacent
simultaneously with the following code:plot(ellipse(lmod,c(2,6)),type="l",ylim=c(-0.13,0))
points(coef(lmod)[2], coef(lmod)[6], pch=19)
abline(v=confint(lmod)[2,],lty=2)
abline(h=confint(lmod)[6,],lty=2)
The above has the effect of creating a confidence interval for the second and sixth parameter of the model lmod
.
We plot additional points for the parameter estimates, as well as include lines in the plot for the univariate confidence intervals.
The result is pictured next…
Courtesy of: Faraway, J. Linear Models with R. 2nd Edition