09/30/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
We have discussed largely about the case where there is some dependence (i.e., correlation) among explanatory variables.
Q: Qualitatively, what occurs when the explanatory variables are totally statistically indpedendent?
A: each variable contributes unique information into the model which cannot be inferred from the values of the other variables.
Q: How does this aid our analysis?
A: in one sense, we maximize the value of each estimated parameter \( \beta_i \) as it corresponds to a statistically independent variable's contribution to the response.
This is closely related to the idea of orthogonality, when the space spanned by each variable is perpendicular to each other.
Orthogonality can be loosely read as “perpendicular”.
Recall an equivalent description of the vector inner product,
\[ \begin{align} \mathbf{a} \cdot \mathbf{b} & = \parallel \mathbf{a} \parallel \parallel \mathbf{b} \parallel cos(\theta) \\ &= \text{"length of } \mathbf{a}\text{"} \times \text{"length of } \mathbf{b}\text{"} \times cos( \text{"the angle between"}) \end{align} \]
Q: If there are 90 degrees between the two vectors \( \mathbf{a} \) and \( \mathbf{b} \), then what does the inner product \( \mathbf{a} \cdot \mathbf{b} \) equal?
A: \( cos(90^\circ)=0 \), such that the inner product must vanish – therefore, orthogonal vectors have a zero inner product.
This idea extends to matrices, when the columns of two matrices are orthogonal.
In particular, if \( \mathbf{A} \) is an orthogonal matrix then \( \mathbf{A}^\mathrm{T}\mathbf{A} = \mathbf{I} \).
Suppose we can decompose the explanatory variables into two groups \( \mathbf{X}_1 \) and \( \mathbf{X}_2 \) which are orthogonal to each other,
\[ \begin{align} \mathbf{X} &\triangleq \begin{pmatrix} \mathbf{X}_1 \vert \mathbf{X}_2 \end{pmatrix} \end{align} \]
Notice that (regardless of orthogonality) we have the equality:
\[ \begin{align} \mathbf{X}\beta &= \begin{pmatrix} \mathbf{X}_1 \vert \mathbf{X}_2 \end{pmatrix} \begin{pmatrix} \beta_1 \\ \beta_2 \end{pmatrix} \\ &= \mathbf{X}_1 \beta_1 + \mathbf{X}_2 \beta_2 \end{align} \]
so that we split the explanatory variables and parameters into two groups via the definition of the matrix product.
Q: Assume that \( \mathbf{X} \triangleq \begin{pmatrix}\mathbf{X}_1 \vert \mathbf{X}_2 \end{pmatrix} \) and \( \mathbf{X}_1 \) and \( \mathbf{X}_2 \) are orthogonal to each other. What does \( \mathbf{X}^\mathrm{T} \mathbf{X} \) equal block-wise?
Solution: we find that the matrix product yields,
\[ \begin{align} \mathbf{X}^\mathrm{T} \mathbf{X} &= \begin{pmatrix} \mathbf{X}^\mathrm{T}_1 \mathbf{X}_1 & \mathbf{0} \\ \mathbf{0} & \mathbf{X}^\mathrm{T}_2 \mathbf{X}_2 \end{pmatrix} \end{align} \] due to the orthogonality of the two columns.
Q: using the above fact, can you derive what the orthogonal projection operator \[ \mathbf{H}= \mathbf{X}\left(\mathbf{X}^\mathrm{T}\mathbf{X}\right)^{-1}\mathbf{X}^\mathrm{T} \] equals block-wise?
A: From this fact, we can now write the product
\[ \begin{align} \mathbf{H} &\triangleq \mathbf{X}\left(\mathbf{X}^\mathrm{T}\mathbf{X}\right)^{-1}\mathbf{X}^\mathrm{T} \\ &=\begin{pmatrix} \mathbf{X}_1\left(\mathbf{X}^\mathrm{T}_1 \mathbf{X}_1\right)^{-1}\mathbf{X}_1^\mathrm{T} & \mathbf{0} \\ \mathbf{0} & \mathbf{X}_2\left(\mathbf{X}^\mathrm{T}_2 \mathbf{X}_2\right)^{-1}\mathbf{X}_2^\mathrm{T} \end{pmatrix} \end{align} \]
In the previous question, we see that the prediction of the fittted values decomposes entirely along the two sub-sets of variables.
Likewise, we will find that \( \hat{\boldsymbol{\beta}} \) decomposes into two sets of parameters \( \hat{\boldsymbol{\beta}}_1 \) and \( \hat{\boldsymbol{\beta}}_2 \).
Q: recall that the estimated covariance of the parameter values is given as, \[ \begin{align} cov\left(\hat{\boldsymbol{\beta}}\right) &=\sigma^2 \left(\mathbf{X}^\mathrm{T}\mathbf{X}\right)^{-1} . \end{align} \] What does orthogonality of the columns of \( \mathbf{X} \) imply about the covariance of the parameters \( \hat{\boldsymbol{\beta}} \)?
A: in particular, if the columns of \( \mathbf{X} \) are orthogonal to each other, we find that the estimated parameters \( \hat{\boldsymbol{\beta}} \) are uncorrelated.
Qualitatively, we should understand that the value of one parameter estimate \( \hat{\beta}_i \) does not inform the value of the estimate \( \hat{\beta}_j \) for \( i\neq j \).
We want to relate the notion of orthogonality more directly to the correlation between variables in the statistical sense.
Let \( \overline{\mathbf{X}}^{(i)} \) be the mean of column \( i \) of the design matrix, i.e., \[ \overline{\mathbf{X}}^{(i)} \triangleq \frac{1}{n} \sum_{k=1}^n X_{k,i}, \] summing over the matrix entries \( X_{k,i} \) along the rows \( k=1,\cdots,n \).
We will then define the \( (k,i) \)-th anomaly as \[ a_{(k,i)} = X_{k,i} - \overline{\mathbf{X}}^{(i)}, \] such that the matrix \( \mathbf{A} \) is defined column-wise as \[ \begin{align} \mathbf{A}^{(i)} &\triangleq \mathbf{X}^{(i)} - \frac{1}{n} \boldsymbol{1}\boldsymbol{1}^\mathrm{T}\mathbf{X}^{(i)} \\ \end{align} \] where \( \mathbb{1} \) is the vector of ones, \[ \begin{align} \boldsymbol{1}^\mathrm{T} \triangleq \begin{pmatrix} 1 & 1 & \cdots & 1 \end{pmatrix}. \end{align} \]
If we consider as is standard that \( \mathbf{X} \) are deterministic constants, we may then discuss the sample-based correlation of the predictors.
The sample-based correlation coefficient of the variables \( X_i \) and \( X_j \) can be written, \[ \begin{align} cor(X_i,X_j)\triangleq \frac{\left(\mathbf{A}^{(i)}\right)^\mathrm{T} \mathbf{A}^{(j)}}{\sqrt{\left[\left(\mathbf{A}^{(i)}\right)^\mathrm{T}\mathbf{A}^{(i)}\right] \left[\left(\mathbf{A}^{(j)}\right)^\mathrm{T}\mathbf{A}^{(j)}\right]}}. \end{align} \]
Q: if the varaibles \( X_i \) and \( X_j \) are uncorrelated, what does this say about their anomalies?
A: they must be orthogonal.
We can consider thus a change of variables for our standard model, in the case our variables are uncorrelated.
Let us suppose the form of the model is
\[ \begin{align} Y &= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_{p-1} X_{p-1} \\ &= \beta_0 + \sum_{i=1}^{p-1} \beta_i X_i\\ &= \beta_0 + \sum_{i=1}^{p-1} \beta_i \left(X_i- \overline{\mathbf{X}}^{(i)} + \overline{\mathbf{X}}^{(i)}\right)\\ &= \left[\beta_0 + \sum_{i=1}^{p-1} \beta_i\overline{\mathbf{X}}^{(i)}\right] + \sum_{i=1}^{p-1} \beta_i A_i \end{align} \] i.e., where we re-write the model in terms of the anomalies as the predictors.
Recall, by our assumptions, \( \left[\beta_0 + \sum_{i=1}^{p-1} \beta_i\overline{\mathbf{X}}^{(1)}\right] \) is just a constant which can be re-named as \( \tilde{\beta}_0 \).
Q: for the above model, are the parameter estimates for \( \tilde{\beta}_0, \beta_1, \cdots \beta_{p-1} \) correlated?
A: no, by the orthogonality of the anomalies, the covariance of \( \hat{\boldsymbol{\beta}} \) in terms of the anomalies will be given by,
\[ \begin{align} \mathrm{cov}\left(\hat{\boldsymbol{\beta}}\right) &= \sigma^2 \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)^{-1} \\ &=\sigma^2 \left(\mathbf{I}\right)^{-1} \\ &= \sigma^2 \mathbf{I} \end{align} \]
Thus in the case that the predictors are uncorrelated, the parameter estimates for the model in terms of the anomalies have the same covariance as the cases themselves.
We can thus see that having uncorrelated predictors allows us to construct a model (possibly in terms of the anomalies) in which the estimated parameters are also uncorrelated.
In this case, we can view the parameter estimates loosely as “close-to-independent”.
It tells us that we cannot infer information about one parameter from the value of another;
This is an extremely useful property that typically is only a product of good experimental design – in situ data from observations often have more complicated correlation structures.
require("faraway")
odor
odor temp gas pack
1 66 -1 -1 0
2 39 1 -1 0
3 43 -1 1 0
4 49 1 1 0
5 58 -1 0 -1
6 17 1 0 -1
7 -5 -1 0 1
8 -40 1 0 1
9 65 0 -1 -1
10 7 0 1 -1
11 43 0 -1 1
12 -22 0 1 1
13 -31 0 0 0
14 -35 0 0 0
15 -26 0 0 0
If we reverse the transformation for temp, we get
\[ \text{Farenheit} = \text{temp}\times 40 + 80 \]
Therefore,
farenheit <- odor$temp * 40 + 80
farenheit
[1] 40 120 40 120 40 120 40 120 80 80 80 80 80 80 80
mean(farenheit)
[1] 80
mean(odor$temp)
[1] 0
cov(odor[,-1])
temp gas pack
temp 0.5714286 0.0000000 0.0000000
gas 0.0000000 0.5714286 0.0000000
pack 0.0000000 0.0000000 0.5714286
lmod <- lm(odor ~ temp + gas + pack, odor)
summary(lmod,cor=T)
Call:
lm(formula = odor ~ temp + gas + pack, data = odor)
Residuals:
Min 1Q Median 3Q Max
-50.200 -17.137 1.175 20.300 62.925
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.200 9.298 1.635 0.130
temp -12.125 12.732 -0.952 0.361
gas -17.000 12.732 -1.335 0.209
pack -21.375 12.732 -1.679 0.121
Residual standard error: 36.01 on 11 degrees of freedom
Multiple R-squared: 0.3337, Adjusted R-squared: 0.1519
F-statistic: 1.836 on 3 and 11 DF, p-value: 0.1989
Correlation of Coefficients:
(Intercept) temp gas
temp 0.00
gas 0.00 0.00
pack 0.00 0.00 0.00
lmod <- lm(odor ~ temp + gas, odor)
summary(lmod,cor=T)
Call:
lm(formula = odor ~ temp + gas, data = odor)
Residuals:
Min 1Q Median 3Q Max
-50.20 -36.76 10.80 26.18 62.92
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.200 9.978 1.523 0.154
temp -12.125 13.663 -0.887 0.392
gas -17.000 13.663 -1.244 0.237
Residual standard error: 38.64 on 12 degrees of freedom
Multiple R-squared: 0.1629, Adjusted R-squared: 0.02342
F-statistic: 1.168 on 2 and 12 DF, p-value: 0.344
Correlation of Coefficients:
(Intercept) temp
temp 0.00
gas 0.00 0.00