Further hypothesis testing, confidence intervals and regions

10/07/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Testing a single predictor
- The t-test
- Testing a subspace
- Confidence intervals
- Confidence regions

Testing one predictor

As a general method, we can always use the F-statistic for nested models.
Specifically, whenever one model is given by a subspace of another:
- \( \boldsymbol{\omega} \) consists of models over variables \( x_1, \cdots, x_{q-1} \) and corresponds to \( q \) parameters (including the intercept);
- \( \boldsymbol{\Omega} \) consists of models over variables \( x_1, \cdots , x_{p-1} \) and corresponds to \( p \) parameters (including the itercept);
- such that \( q \text{<} p \).
Concretely, the null hypothesis must be \( H_0 : \boldsymbol{\beta}_i = \boldsymbol{0} \) for each \( i=q,\cdots, p-1 \).
The alternative hypothesis is that the larger model holds,

\[ H_1: \boldsymbol{\beta} \neq 0 \]
We compute the F statistic as, \[ \begin{align} F &\triangleq \frac{ \left( RSS_\boldsymbol{\omega} - RSS_\boldsymbol{\Omega}\right)/ (p-1)}{RSS_\boldsymbol{\Omega}/(n-p)} . \end{align} \]

Testing one predictor – continued

Suppose there is one particular variable that we want to determine the significance of for our model.
Specifically, suppose we have a model,

\[ \begin{align} \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \end{align} \] with respect to some choice of variables \( \mathbf{X} \).
Our alternative hypothesis in this case is,

\[ \begin{align} H_1: \boldsymbol{\beta} \neq \boldsymbol{0}. \end{align} \]
Q: if we want to determine if \( \boldsymbol{\beta}_i \) specifically gives an appreciable difference in this model, what is our null hypothesis?
A: our null hypothesis takes the form,

\[ \begin{align} H_0: \boldsymbol{\beta}_i = 0 \end{align} \]

Testing one predictor – continued

We will examine this on the gala data once again.
We define the model lmods without area as an explanatory variable for the null hypothesis. Then we compute the ANOVA table with the bigger model that contains area

require("faraway")
lmod <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, gala)
lmods <- lm(Species ~ Elevation + Nearest + Scruz + Adjacent, gala)
anova(lmods, lmod)

Analysis of Variance Table

Model 1: Species ~ Elevation + Nearest + Scruz + Adjacent
Model 2: Species ~ Area + Elevation + Nearest + Scruz + Adjacent
  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1     25 93469                           
2     24 89231  1    4237.7 1.1398 0.2963

The result of the \( F \) test is to say, “With probability 29.63%, we will find a value drawn from the F distribution with this value or greater”.
Q: do we reject or fail to reject the null hypothesis at \( 5\% \) significance here?
A: Here we fail to reject the null hypothesis because it is reasonable that the difference between the large model and the small model could be due to random variation.
- Note: there may be some statistical relationship, but we haven't detected one that wouldn't be surpising if it was just noise.

Student t-distribution

Courtesy of Skbkekas CC BY-SA 3.0

The significance of a single variable can also be found with respect to the student t-test.
Generally, suppose that we have \( n \) independent random variables drawn from a Gaussian distribution \( \left\{Y_i\right\}_{i=1}^n \), with unknown true mean \( \mu_Y \) and standard deviation \( \sigma \).
As usual, our sample-based estimate of the mean is given by, \[ \begin{align} \overline{Y} = \frac{1}{n}\sum_{i=1}^n Y_i ; \end{align} \]
and our unbiased, sample-based estimate of the variance is given as, \[ \begin{align} S^2 = \frac{1}{n-1} \sum_{i=1}^n \left(Y_i - \overline{Y}\right)^2. \end{align} \]

It is a powerful and non-trivial result that, \[ \begin{align} \frac{\overline{Y} - \mu_Y}{S/ \sqrt{n}}, \end{align} \] is distributed according to a student t-distribution, in \( n-1 \) degrees of freedom.

Student t-distribution

Courtesy of Skbkekas CC BY-SA 3.0

The student t-distribution, pictured left, is similar to a Gaussian but with only polynomial decay of the probability in the tails.

Particularly, for one degree of freedom \( \nu = 1 \), the density function takes the form, \[ \begin{align} p(x) = \frac{1}{\pi \left( 1 + x^2\right)}. \end{align} \]

In general, the student t-distribution is defined for any number of degrees of freedom \( \nu \) as a parameter, where \[ \begin{align} p(x) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}}, \end{align} \] and \( \Gamma \) is special function known as the Gamma function.

Q: notice, the t-distribution is symmetric about zero – if \( L>0 \) and \( P(X\geq L) = \alpha/2 \) then what is \( P(-L < X < L) \)?

Student t-distribution

Solution: we can consider the two events, \[ \begin{align} A: X\geq L & & B: X \leq -L. \end{align} \]
Due to the symmetry of the student t-distribution about zero, we find,

\[ \begin{align} P(A)=P(B) = \frac{\alpha}{2}. \end{align} \]
Then, notice that,

\[ \begin{align} P(A\cup B) &= P(A) + P(B) - P(A\cap B) \\ &= \frac{\alpha}{2} + \frac{\alpha}{2} - 0 \end{align} \] as the intersection is empty.
Finally, the complement of the event \( A\cup B \) is the event that \( -L < X < L \), such that, \( P(-L < X < L) = 1 - \alpha \).

Testing one predictor – continued

Suppose again, we have \( n \) independent random varaibles drawn from a Gaussian distribution \( \left\{Y_i\right\}_{i=1}^n \), with unknown true mean \( \mu_Y \) and standard deviation \( \sigma \).

Q: if we want to test the hypothesis that \( \mu_Y \neq 0 \), what are the null and alternative hypotheses?
A:

\[ \begin{align} H_0 &: \mu_Y = 0 \\ H_1 &: \mu_Y \neq 0 \end{align} \]
To test the hypothesis, we assume the null hypothesis and refer to the quantity, \[ \begin{align} t^\ast = \frac{\overline{Y} - \mu_Y}{S/ \sqrt{n}} = \frac{\overline{Y}}{S / \sqrt{n}} \end{align} \] with respect to the null.
The value \( t^\ast \sim t_{n-1} \) so that we can identify the value \( t^\alpha \) for which

\[ P(t \geq t^\alpha) = \frac{\alpha}{2}. \]
By symmetry, we see that

\[ \begin{align} P( \vert t \vert \geq \vert t^\alpha\vert) = \alpha \end{align} \]

Testing one predictor – continued

If we want thus to test the hypothesis that a single parameter \( \boldsymbol{\beta}_i \neq 0 \), we can follow a similar procedure under the Gaussian assumption.
Recall, the standard error of a given parameter \( \hat{\boldsymbol{\beta}}_i \) is given \[ \begin{align} se(\hat{\boldsymbol{\beta}}_{i-1}) \triangleq \hat{\sigma}\sqrt{(\mathbf{X}^\mathrm{T}\mathbf{X})^{-1}_{ii}} \end{align} \] where \( \hat{\sigma}^2 = \frac{RSS}{n-p} \).
The value \[ \begin{align} t_i = \frac{\hat{\boldsymbol{\beta}}_i}{se(\hat{\boldsymbol{\beta}}_i)} \end{align} \] has t-distribution in \( (n-p) \) degrees of freedom under the null hypothesis \( H_0:\boldsymbol{\beta}_i=0 \).
Particularly, we will determine the probability of obtaining a random value \( t \) where \( \vert t \vert \geq \vert t_i \vert \) with respect to the t-distribution to determine significance.
- If the above probability falls below our pre-chosen \( \alpha \), or equivalently, \[ \vert t_i\vert > \vert t^\alpha \vert, \] we reject the null that \( \beta_i = 0 \) with \( \alpha \) significance.

Model summary

We can now interpret the entire summary table, except for adjusted \( R^2 \):

summary(lmod)


Call:
lm(formula = Species ~ Area + Elevation + Nearest + Scruz + Adjacent, 
    data = gala)

Residuals:
     Min       1Q   Median       3Q      Max 
-111.679  -34.898   -7.862   33.460  182.584 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.068221  19.154198   0.369 0.715351    
Area        -0.023938   0.022422  -1.068 0.296318    
Elevation    0.319465   0.053663   5.953 3.82e-06 ***
Nearest      0.009144   1.054136   0.009 0.993151    
Scruz       -0.240524   0.215402  -1.117 0.275208    
Adjacent    -0.074805   0.017700  -4.226 0.000297 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 60.98 on 24 degrees of freedom
Multiple R-squared:  0.7658,    Adjusted R-squared:  0.7171 
F-statistic:  15.7 on 5 and 24 DF,  p-value: 6.838e-07

NOTE: The null hypothesis will have a different interpretation if it is stated with respect to different alternative hypotheses.

summary(lm(Species ~ Area, gala))


Call:
lm(formula = Species ~ Area, data = gala)

Residuals:
    Min      1Q  Median      3Q     Max 
-99.495 -53.431 -29.045   3.423 306.137 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 63.78286   17.52442   3.640 0.001094 ** 
Area         0.08196    0.01971   4.158 0.000275 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 91.73 on 28 degrees of freedom
Multiple R-squared:  0.3817,    Adjusted R-squared:  0.3596 
F-statistic: 17.29 on 1 and 28 DF,  p-value: 0.0002748

Q: what do you notice about the p-value for the F-statistic and the p-value for the t-statistic for area?
- A: These correspond to the same test of the null versus alternative hypothesis in this case, i.e., leaving out area is the same null hypothesis as the data is expressed only as random variation around the mean \( \overline{\mathbf{Y}} \).
- The p-value for the t-test on Area is not the same here as it was on the last slide, because this is framed with respect to a different alternative hypothesis.

Review

We now have two methods for comparing different models: the F-test and the t-test.
The F-test is defined to compare the likelihood of any two models, as long as one lies in a subspace of another model.
- Particularly, this is defined as long as the small model \( \boldsymbol{\omega} \) uses some combination of the same explanatory variables of the large model \( \boldsymbol{\Omega} \).
- This measures the ratio of the maximum likelihoods of both models, which is distributed according to an F distribution under the null hypothesis (the smaller model is accurate).
The t-test is defined for a single parameter, rather than a combination of parameters like the F-test.
- This computes the probability of \[ \begin{align} t_i = \frac{\hat{\boldsymbol{\beta}}_i}{se(\hat{\boldsymbol{\beta}}_i)}, \end{align} \] under the null hypothesis where \( \boldsymbol{\beta}_i = 0 \).
In particular, the two tests are equivalent when we test the two models

\( \boldsymbol{\Omega} \) defined over \( X_1, \cdots, X_{p-1} \); and
\( \boldsymbol{\omega} \) defined over \( X_1,\cdots, X_{p-2} \)

Testing a pair of explanatory variables

Suppose we wish to determine if the two variables area and adjacent have an effect on the response relative to the model with all other variables.
Particularly, we obtain the null hypothesis \( H_0 : \boldsymbol{\beta}_{area}= \boldsymbol{\beta}_{adjacent}=0 \).

lmods <- lm(Species ~ Elevation + Nearest + Scruz, gala)
anova(lmods, lmod)

Analysis of Variance Table

Model 1: Species ~ Elevation + Nearest + Scruz
Model 2: Species ~ Area + Elevation + Nearest + Scruz + Adjacent
  Res.Df    RSS Df Sum of Sq      F  Pr(>F)   
1     26 158292                               
2     24  89231  2     69060 9.2874 0.00103 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The probability of drawing such an F value is around 1/1000, so we reject the null hypothesis.
This tells us that it is extremely unlikely that there is no combination of the variables area and adjacent which have some effect on the response, when comparing the two models.

Testing a pair of explanatory variables

Rejecting the null hypothesis in the last example, testing a pair of variables, cannot be inferred from their respective t-tests.

sumary(lmod)

             Estimate Std. Error t value  Pr(>|t|)
(Intercept)  7.068221  19.154198  0.3690 0.7153508
Area        -0.023938   0.022422 -1.0676 0.2963180
Elevation    0.319465   0.053663  5.9532 3.823e-06
Nearest      0.009144   1.054136  0.0087 0.9931506
Scruz       -0.240524   0.215402 -1.1166 0.2752082
Adjacent    -0.074805   0.017700 -4.2262 0.0002971

n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77

There is no simple way of combining the above information to test a pair of explanatory variables simultaneously.
Testing a pair of variables needs to be performed with the F-test over the two nested models.

Testing subspaces

Suppose we think we can make a simpler model by combining two variables as some kind of linear combination,
- e.g., what if we want to create a new variable \[ \begin{align} \tilde{X}_i \triangleq X_i + X_j \end{align} \] corresponding to two explanatory variables \( X_i,X_j \).
- This still makes sense with an F-test because we are again testing a null hypothesis as a nested subspace of the space of all variables.

Testing subspaces – continued

Concretely, suppose we wish to take a new variable corresponding to the sum of the area of the island itself and the area of the adjacent island.
Our null hypothesis will be that \( \boldsymbol{\beta}_{area} = \boldsymbol{\beta}_{adjacent} \)

lmod  <- lm(Species ~ Area + Adjacent + Elevation + Nearest + Scruz, gala)
lmods <- lm(Species ~ I(Area+Adjacent) + Elevation + Nearest + Scruz, gala)
anova(lmods,lmod)

Analysis of Variance Table

Model 1: Species ~ I(Area + Adjacent) + Elevation + Nearest + Scruz
Model 2: Species ~ Area + Adjacent + Elevation + Nearest + Scruz
  Res.Df    RSS Df Sum of Sq     F  Pr(>F)  
1     25 109591                             
2     24  89231  1     20360 5.476 0.02793 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the above, notice the only difference is the I() function, which enforces that the “+” expression isn't read as a formula.
Q: do we reject or fail to reject the null hypothesis?
A:In this case, we can reject the null hypothesis at the \( \alpha = 5\% \) significance level.

Testing subspaces – continued

We have learned so far how to test the significance of a given parameter \( \boldsymbol{\beta}_i=0 \), which would reduce the model space \( \boldsymbol{\Omega} \) to the subspace \( \boldsymbol{\omega} \) where \( X_i \) has no effect on the response.
We may quite similarly test the significance of a parameter having a specific value (other than zero);
- here \( \boldsymbol{\omega} \) will represent the subspace of the big model space \( \boldsymbol{\Omega} \) where \( X_i \) has a specified effect.
Consider the null hypothesis \( H_0: \boldsymbol{\beta}_{Elevation} = 0.5 \), versus the alternative hypothesis where we consider the space of models over

Species ~ Area + Adjacent + Elevation + Nearest + Scruz.

We use the function offset to fix the associated value \( \boldsymbol{\beta}_{Elevation} = 0.5 \)

lmods <- lm(Species ~ Area+ offset(0.5*Elevation) + Nearest + Scruz + Adjacent, gala)

Testing subspaces – continued

Comparing the nested sub-model with an ANOVA table,

anova(lmods, lmod)

Analysis of Variance Table

Model 1: Species ~ Area + offset(0.5 * Elevation) + Nearest + Scruz + 
    Adjacent
Model 2: Species ~ Area + Adjacent + Elevation + Nearest + Scruz
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     25 131312                                
2     24  89231  1     42081 11.318 0.002574 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Q: do we reject or fail to rejec the null hypothesis?
A: we reject the null hypothesis at \( 5\% \) significance.

Testing subspaces – continued

Note: specifying a particular value for the parameter can also be seen in terms of specifying a non-zero mean for the parameter \( \hat{\boldsymbol{\beta}}_i \) in a t-test.
Suppose we want to test the following null hypothesis, \[ H_0: \boldsymbol{\beta}_{Elevation} = C \] for some constant \( C \) as discussed above.
If \( \hat{\boldsymbol{\beta}}_i \) is the solution by least squares, we can compute
\[ \begin{align} t_i = \frac{\left(\hat{\boldsymbol{\beta}}_i - C\right)}{se(\boldsymbol{\beta})} \end{align} \]
By the same reasoning as before, this is t-distributed in \( n-p \) degrees of freedom with the assumption under the null hypothesis that

\[ \mathbb{E}\left[\hat{\boldsymbol{\beta}}_i\right] = C \]

Confidence intervals

We return now to the idea of confidence intervals as an alternative (and dual) idea to hypothesis testing.
Recall that under our Gaussian assumptions,

\[ \boldsymbol{\epsilon} \sim N\left(\boldsymbol{0}, \sigma^2 \mathbf{I} \right), \]
we have derived,

\[ \hat{\boldsymbol{\beta}} \sim N\left(\boldsymbol{\beta}, \left(\mathbf{X}^\mathrm{T}\mathbf{X}\right)^{-1} \sigma^2 \right). \]
In this context, we have an unbiased, sample based estimate for the true mean \( \boldsymbol{\beta} \), with a known variance and \( n-p \) degrees of freedom.

Confidence intervals – continued

We recall from the t-test that the value

\[ \begin{align} t_i = \frac{\hat{\boldsymbol{\beta}}_i - \boldsymbol{\beta}_i}{se(\hat{\boldsymbol{\beta}}_i)} \end{align} \]

is t-distributed in \( n-p \) degrees of freedom.

Suppose that \( t^\frac{\alpha}{2} \) is chosen such that \( P(X\geq t^\frac{\alpha}{2}_{n-p}) = \alpha/2 \);
- then recall that we derived in an earlier exercise that \( P(-t^\frac{\alpha}{2}_{n-p} < X < t^\frac{\alpha}{2}_{n-p})=1- \alpha \).
Using this dual notion to the hypothesis test, we can create an interval centered at \( \hat{\boldsymbol{\beta}} \) (using the t-distribution) with some measure of confidence that the true \( \boldsymbol{\beta} \) lives within it.
- Again, we do not say that the “true” \( \boldsymbol{\beta} \) lives within such an interval with probability \( 100(1-\alpha)\% \), it lies there or not.
- Rather, we perform a procedure that over many sets of observations, and with respect to the procedure, we have guaranteed that the true value will lie in our interval with certain confidence \( 100(1-\alpha)\% \).

Confidence intervals – continued

Let us say (similarly to hypothesis testing) we wish to guarantee \( \alpha=5\% \) significance.
Our confidence interval will take the form of

\[ \begin{align} \left( \hat{\boldsymbol{\beta}}_i - t_{n-p}^{^\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i), \hat{\boldsymbol{\beta}}_i + t_{n-p}^{^\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i)\right) \end{align} \]

Where we define \( \large{t_{n-p}^{\frac{\alpha}{2}} } \) to be the critical value for which,
- if \( t \sim t_{n-p} \),
- the probability of \[ \begin{align} t \geq t_{n-p}^{\alpha/2} \end{align} \] is equal to \( \frac{\alpha}{2} \).
The symmetry of the student t-distribution guarantees that, with \( 100(1-\alpha)\% \) confidence, the true value

\[ \begin{align} \boldsymbol{\beta}_i \in \left( \hat{\boldsymbol{\beta}}_i - t_{n-p}^{\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i), \hat{\boldsymbol{\beta}}_i + t_{n-p}^{\frac{\alpha}{2}} se(\hat{\boldsymbol{\beta}}_i)\right) \end{align} \]

Confidence intervals – continued

Suppose that we wish to compute a confidence interval on \( \boldsymbol{\beta}_{Area} \), such that with \( 95\% \) confidence, we say the “true value” of \( \boldsymbol{\beta}_{Area} \) lives within it.

lmod <- lm(Species ~ Elevation + Nearest + Scruz + Area + Adjacent, gala)
sumary(lmod)

             Estimate Std. Error t value  Pr(>|t|)
(Intercept)  7.068221  19.154198  0.3690 0.7153508
Elevation    0.319465   0.053663  5.9532 3.823e-06
Nearest      0.009144   1.054136  0.0087 0.9931506
Scruz       -0.240524   0.215402 -1.1166 0.2752082
Area        -0.023938   0.022422 -1.0676 0.2963180
Adjacent    -0.074805   0.017700 -4.2262 0.0002971

n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77

We note that the standard error for area is given as \( \approx 0.022422 \),
We compute the \( 2.5\% \) critical value for \( T_{30 - 6} \) and obtain the confidence interval as,

t_crit <- qt(0.975, 30-6)
-0.02394 + c(-1,1) * t_crit * 0.02242

[1] -0.07021261  0.02233261

Q: does this indicate we can reject the null hypothesis \( H_0: \boldsymbol{\beta}_{Area}=0 \) at \( 5\% \) significance?

Confidence intervals – continued

A: notice, the interval contains \( 0 \) so that we cannot reject the null at \( 5\% \) significance.

-0.02394 + c(-1,1) * t_crit * 0.02242

[1] -0.07021261  0.02233261

This can be seen dually through the p-value of the parameter for area

sumary(lmod)

             Estimate Std. Error t value  Pr(>|t|)
(Intercept)  7.068221  19.154198  0.3690 0.7153508
Elevation    0.319465   0.053663  5.9532 3.823e-06
Nearest      0.009144   1.054136  0.0087 0.9931506
Scruz       -0.240524   0.215402 -1.1166 0.2752082
Area        -0.023938   0.022422 -1.0676 0.2963180
Adjacent    -0.074805   0.017700 -4.2262 0.0002971

n = 30, p = 6, Residual SE = 60.97519, R-Squared = 0.77

Confidence intervals – continued

Similarly, we can compute the \( 95\% \) confidence interval for \( \boldsymbol{\beta}_{Adjacent} \) with,
- the \( t_{30-6}^{2.5} \) critical value; and
- the \( se(\boldsymbol{\beta}_{Adjacent}) \)

-0.07480 + c(-1,1) * t_crit * 0.01770

[1] -0.111331 -0.038269

Q: Based on the above confidence interval, can we reject the null hypothesis \( H_0: \boldsymbol{\beta}_{Adjacent}=0 \) with \( 5\% \) significance?
A: Yes.
- However, while we say (with \( 95\% \) confidence) that the effect of the adjacent island area is negative on the response, but the uncertainty is large, with the width of the interval on the same order as the coefficient itself,

(-0.038269 + 0.111331)/abs(-0.07480)

[1] 0.9767647

Confidence intervals – continued

Confidence intervals, while dual to hypothesis test, provide slightly more information than the p-values.
Particularly, we can examine a range of equally plausible values for the parameter of interest.
- If the confidence interval is relatively tight around the value of the parameter, we can provide more evidence of the actual effect of the predictor.
- However, if the range of values is large, this indicates a great deal of uncertainty of the effect of the predictor and the scale of the effect.
- When the confidence interval contains zero, then it is equally plausible that the effect of changing the predictor will be positive or negative (or nothing) on the response.

Confidence intervals – continued

We can find the confidence intervals of all parameters simultaneously (at the \( 95\% \) confidence level) as

 confint(lmod)

                  2.5 %      97.5 %
(Intercept) -32.4641006 46.60054205
Elevation     0.2087102  0.43021935
Nearest      -2.1664857  2.18477363
Scruz        -0.6850926  0.20404416
Area         -0.0702158  0.02233912
Adjacent     -0.1113362 -0.03827344

Setting another confidence level, e.g., \( 99\% \) can be done with a keyword argument,

 confint(lmod, level = 0.99)

                  0.5 %      99.5 %
(Intercept) -46.5049119 60.64135329
Elevation     0.1693731  0.46955638
Nearest      -2.9392105  2.95749844
Scruz        -0.8429913  0.36194283
Area         -0.0866523  0.03877562
Adjacent     -0.1243112 -0.02529848

Q: why is the \( 99\% \) confidence interval wider than the \( 95\% \) confidence interval?
A: to be more confident in the location of the true value, we need to widen the interval correspondingly.

Confidence regions

The previous example only considered the univariate confidence interval, analogous to the t-test for a single parameter, i.e.,
- \( H_0: \boldsymbol{\beta}_{p-1} = 0 \)
- \( H_1: \boldsymbol{\beta}_{i} \neq 0 \), \( i=0,\cdots, p-1 \).
Suppose we want (analogous to the F-test) to find a multivariate confidence region.
For \( \alpha \) the significance level, we find the \( 100(1-\alpha)\% \) confidence region for \( \hat{\boldsymbol{\beta}} \) is given by the relationship

\[ \begin{align} \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right)^\mathrm{T}\left(\mathbf{X}^\mathrm{T}\mathbf{X} \right) \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \leq p \hat{\sigma}^2 F^{(\alpha)}_{p, n-p} \end{align} \]

The above can be interpreted as a weighted inner product of \( \left(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}\right) \) with itself with respect to its estimated inverse covaraince, i.e., a measure of distance squared, weighted by \( \frac{1}{\hat{\sigma}^2}\left(\mathbf{X}^\mathrm{T}\mathbf{X} \right) \).
We say that with \( 100(1-\alpha)\% \) confidence the true parameter \( \boldsymbol{\beta} \) will lie within the domain of the weighted distance function, for which the output is less than \[ p \hat{\sigma}^2 F^{(\alpha)}_{p, n-p}. \]

Confidence regions – continued

We can visualize such a region in 2-D, but in higher dimensions it becomes a hyper-ellipsoid.
The package for creating confidence regions is not part of base R, and requires the installation of the package,

install.packages("ellipse"))

We will produce a confidence ellipse for the two parameters area and adjacent simultaneously with the following code:

plot(ellipse(lmod,c(2,6)),type="l",ylim=c(-0.13,0))
points(coef(lmod)[2], coef(lmod)[6], pch=19)
abline(v=confint(lmod)[2,],lty=2)
abline(h=confint(lmod)[6,],lty=2)

The above has the effect of creating a confidence interval for the second and sixth parameter of the model lmod.
We plot additional points for the parameter estimates, as well as include lines in the plot for the univariate confidence intervals.
The result is pictured next…

Confidence regions – continued

Confidence ellipsoid for adjacent and area variables

Courtesy of: Faraway, J. Linear Models with R. 2nd Edition

In the ellipsoid plot, we see the \( 95\% \) confidence region for \( \boldsymbol{\beta}_{area} \) and \( \boldsymbol{\beta}_{Adjacent} \) simultaneously.

the vertical lines represent the univariate \( 95\% \) confidence interval for “area”;
the horizontal lines represent the univariate \( 95\% \) confidence interval for “adjacent”;
the center dot represents the estimated value.

In this case, we see how the confidence region isn't simply the box of the combined intervals;
- this is similar to the fact that the individual t-tests for including or excluding a variable can’t be direclty combined.
- Particularly, the weighted distance must always define an ellipsoid, due to the properties of eigenvalues of positive-definite, symmetric matrices.