Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
We have now introduced the expected value for a RV \( \pmb{X} \) as the analog of the center of mass in multiple variables.
In one dimension, the notion of variance \( \mathrm{var}\left(X\right)=\sigma^2 \) and the standard deviation \( \sigma \) give us measures of the spread of the RV and the data derived from observations of it.
We define the variance of \( X \) once again in terms of,
\[ \mathrm{var}\left(X\right) = \sigma^2 = \mathbb{E}\left[\left(X - \overline{x}\right)^2\right] \] which can be seen as the average deviation of the RV \( X \) from its mean, in the square sense.
When we have two RVs \( X \) and \( Y \), we will need to take additional considerations of how these variables co-vary together or oppositely in their conditional probability.
Consider that for the univariate expectation, with the two RVs \( X \) and \( Y \), we have
\[ \begin{align} \mathbb{E}\left[ X + Y \right] &= \mathbb{E}\left[X \right] + \mathbb{E}\left[ Y\right] \\ &=\overline{x} + \overline{y} \end{align} \]
However, the same does not apply when we take the variance of the sum of the variables;
\[ \begin{align} \mathrm{var}\left( X+Y\right) &= \mathbb{E}\left[ \left(X + Y - \overline{x} - \overline{y}\right)^2\right] \\ &=\mathbb{E}\left[\left\{ \left( X - \overline{x} \right) +\left( Y - \overline{y} \right) \right\}^2\right]\\ & = \mathbb{E}\left[ \left( X - \overline{x} \right)^2 + \left( Y - \overline{y} \right)^2 + 2 \left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right] \end{align} \]
Question: Using the linearity of the expectation, and the definition of the variance, how can the above be simplified?
\[ \begin{align} \mathrm{var}\left( X+Y\right) &= \mathrm{var}\left(X\right) + \mathrm{var}\left(Y\right) + 2 \mathbb{E}\left[\left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right] \end{align} \]
Therefore, the combination of the RVs has a variance that is equal to the sum of the variances plus the newly identified cross terms.
We note that if \( X \) and \( Y \) are independent, i.e.,
\[ \begin{align} \mathcal{P}(X\vert Y) = \mathcal{P}(X) & & \mathcal{P}(Y \vert X) = \mathcal{P}(Y); \end{align} \]
then we have \[ \begin{align} \mathbb{E}\left[\left(X - \overline{x}\right) \left(Y - \overline{y} \right)\right] = \mathbb{E}\left[X - \overline{x}\right] \mathbb{E}\left[Y - \overline{y} \right] = 0. \end{align} \]
Therefore, we can consider the covariance,
\[ \mathrm{cov}\left(X,Y\right) = \sigma_{X,Y} = \mathbb{E}\left[\left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right], \] to be a measure of how the variables \( X \) and \( Y \) co-vary together in their conditional probabilities.
We should note that while \( \mathrm{cov}\left(X,Y\right)=0 \) for any pair of independent variables, this condition is not the same as independence in general.
Particularly, we will denote
\[ \begin{align} \mathrm{cor}(X,Y) =\rho_{X,Y} = \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathrm{var}\left(X\right) \mathrm{var}\left(Y\right)}}=\frac{\sigma_{X,Y}}{\sqrt{\sigma_{X}^2 \sigma_{Y}^2}} = \frac{\sigma_{X,Y}}{\sigma_{X} \sigma_{Y}} \end{align} \] the correlation between the variables \( X \) and \( Y \).
If the correlation / covariance of the two variables \( X \) and \( Y \) is equal to zero, then
\[ \mathrm{var}\left( X+Y\right) = \mathrm{var}\left(X\right) + \mathrm{var}\left(Y\right), \] but this does not imply that they are independent, just that we cannot detect the dependence structure with this measure.
Question: how can you use the above definition of the correlation to show that \( X \) always has correlation \( 1 \) with itself?
\[ \mathrm{cor}(X,X) =\rho_{X,X} = \frac{\mathrm{cov}(X,X)}{\sqrt{\mathrm{var}\left(X\right) \mathrm{var}\left(X\right)}}=\frac{\sigma_{X}^2}{\sqrt{\sigma_{X}^2 \sigma_{X}^2}}= 1 \]
More generally, we can say that for any two RVs \( X \) and \( Y \),
\[ -1 \leq \mathrm{cor}\left(X,Y\right)\leq 1. \]
This can be shown as follows, where
\[ \begin{align} 0 & \leq \mathrm{var}\left( \frac{X}{\sigma_X} + \frac{Y}{\sigma_Y} \right) \\ &=\mathrm{var}\left(\frac{X}{\sigma_X}\right) + \mathrm{var}\left(\frac{Y}{\sigma_Y}\right) + 2\mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \end{align} \] using the relationship we have just shown.
We note that when we divide a RV by its standard deviation, the variance becomes one;
\[ \begin{align} & 0 \leq 1 + 1 +2 \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \\ \Leftrightarrow & -1\leq \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \end{align} \]
Let's recall that we just showed,
\[ -1\leq \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) . \]
Let's note that, \( \mathbb{E}\left[ \frac{X}{\sigma_X} \right] = \frac{\overline{x}}{\sigma_X} \) so that \[ \begin{align} \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) &= \mathbb{E}\left[\left(\frac{X -\overline{x}}{\sigma_X}\right)\left(\frac{Y - \overline{y}}{\sigma_Y}\right)\right] \\ &= \frac{\mathbb{E}\left[\left(X -\overline{x}\right)\left(Y - \overline{y} \right) \right]}{\sigma_X \sigma_Y}\\ &= \frac{\sigma_{XY}}{\sigma_X \sigma_Y} \\ &= \mathrm{cor}(X,Y) \end{align} \]
Using the two statements above, we have \[ \begin{align} \Leftrightarrow & -1 \leq \mathrm{cor}\left(X,Y\right) \end{align} \]
If we repeat the above argument with \( -X \) in the place of \( X \), we will get the statement \( \mathrm{cor}\left(X,Y\right) \leq 1 \) to complete the argument.
In the last slide we showed how we can identify,
\[ -1 \leq \mathrm{cor}\left(X,Y\right)\leq 1 \] for any pair of RVs \( X \) and \( Y \).
With the above range in mind, we say that a correlation of “close-to-one” means that the variables \( X \) and \( Y \) vary together almost identically;
Conversely, a correlation of “close-to-negative-one” means that the variables \( X \) and \( Y \) vary together almost identically oppositely;
This can be understood similarly by taking the \( \mathrm{cov}\left(-X,X\right) \);
\[ \begin{align} \mathrm{cov}\left(-X, X\right) &= \mathbb{E}\left[\left(-X - (-\overline{x}) \right)\left( X - \overline{x}\right) \right] \\ &= - \mathbb{E}\left[\left( X - \overline{x}\right)^2\right]\\ &= - \mathrm{cov}(X,X) \end{align} \]
It is easy to show then that \( \mathrm{cor}(-X,X) = -1 \).
We suppose now that we have a RV, \( \pmb{X}\sim P \) where each component is a RV,
\[ \begin{align} \pmb{X} = \begin{pmatrix} X_1 \\ \vdots \\ X_{N_x} \end{pmatrix} \in \mathbb{R}^{N_x}. \end{align} \]
For each component RV, we may similarly define,
\[ \begin{align} \mathrm{var}\left(X_i\right) &= \mathbb{E}\left[ \left(X_i - \overline{x}_i\right)^2 \right] \\ \mathrm{cov}\left(X_i, X_j \right) &= \mathbb{E}\left[ \left(X_i - \overline{x}_i\right) \left(X_j - \overline{x}_j\right) \right] \end{align} \] as we did for \( X \) and \( Y \).
The component-wise definition above is convenient in how it extends from the simple discussion before;
Recall the Euclidean norm of an arbitrary vector is defined as
\[ \parallel \pmb{v}\parallel = \sqrt{\pmb{v}^\mathrm{T} \pmb{v}} \] gives the general form for a distance in arbitrarily large dimensions.
Notice it is defined in terms of the vector inner product, where
\[ \pmb{v}^\mathrm{T}\pmb{v} =\begin{pmatrix}v_1 & \cdots & v_{N_x} \end{pmatrix} \begin{pmatrix}v_1 \\ \vdots \\ v_{N_x}\end{pmatrix} = \sum_{i=1}^{N_x} v_i^2 \]
If we instead change the order of the transpose, we obtain the outer product as
\[ \begin{align} \pmb{v}\pmb{v}^\mathrm{T}& = \begin{pmatrix} v_1 \\ \vdots \\ v_{N_x} \end{pmatrix} \begin{pmatrix}v_1 & \cdots & v_{N_x}\end{pmatrix} \\ &= \begin{pmatrix} v_1 v_1 & v_1 v_2 & \cdots & v_1 v_{N_x} \\ v_2 v_1 & v_2 v_2 & \cdots & v_2 v_{N_x} \\ \vdots & \vdots & \ddots & \vdots \\ v_{N_x} v_1 & v_{N_x} v_2 & \cdots & v_{N_x} v_{N_x} \end{pmatrix}, \end{align} \] which is instead matrix valued in the output.
When we extend the notion of the covariance to a RV \( \pmb{X} \), finding the variances and the covariances of all of its entries, we arrive at the notion of covariance using the outer product.
Particularly, suppose that \( \mathbb{E}\left[\pmb{X}\right] = \overline{\pmb{x}} \); then we write
\[ \begin{align} \mathrm{cov}\left(\pmb{X}\right) = \mathbf{B} = \mathbb{E}\left[\left(\pmb{X}-\overline{\pmb{x}}\right) \left(\pmb{X} - \overline{\pmb{x}} \right)^\mathrm{T} \right] \end{align} \]
That is, if \( \{\pmb{X}_j\}_{j=1}^{N_e} \) is a random sample with parent distribution \( \pmb{X} \sim P \), the background covariance represents the population covariance which \( \pmb{X}_j \) is distributed according to.
Using the previous outer product formula, we obtain the product
\[ \begin{align} \left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\mathrm{T} &= \begin{pmatrix} \left(X_1 - \overline{x}_1\right)\left(X_1 - \overline{x}_1 \right) & \cdots & \left(X_1 - \overline{x}_1 \right) \left(X_{N_x} - \overline{x}_{N_x} \right) \\ \vdots & \ddots & \vdots \\ \left(X_{N_x} - \overline{x}_{N_x} \right)\left(X_1 - \overline{x}_1 \right)& \cdots & \left(X_{N_x} - \overline{x}_{N_x} \right)\left(X_{N_x} - \overline{x}_{N_x} \right) \end{pmatrix}. \end{align} \]
The (background) covariance matrix
Let \( \pmb{X}\sim P \) be a RV with mean \( \mathbb{E}\left[\pmb{X}\right] = \overline{\pmb{x}} \). The (background) covariance matrix is defined \[ \begin{align} \mathbf{B} = \mathrm{cov}(\pmb{X}) := \mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\top\right] & & & & \mathbf{B}_{ij} = \begin{cases} \mathrm{var}\left( X_i\right) & & \text{when }i=j \\ \mathrm{cov}\left(X_i,X_j\right) & & \text{when } i \neq j \end{cases} \end{align} \]
The above covariances and variances are to be understood in the same sense as in the univariate discussion, but for the component RVs \( X_i \) and \( X_j \).
Note, the covariance \( \mathrm{cov}\left(X_i, X_j\right) = \mathrm{cov}\left(X_j, X_i\right) \) is symmetric;
Furthermore, the eigenvalues of \( \mathbf{B} \) are all non-negative in general.
If the component RVs \( X_i,X_j \) are uncorrelated, \( \mathbf{B} \) is also diagonal,
\[ \mathbf{B} = \begin{pmatrix} \mathrm{var}(X_1) & 0 & \cdots & 0 \\ 0 & \mathrm{var}(X_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & \cdots & \mathrm{var}(X_{N_x}) \end{pmatrix} \] and the eigenvalues are identically the variances.
Some basic properties of the covariance follow immediately from the linearity of the expectation over sums.
Suppose that \( \mathbf{A} \) is a constant valued matrix, \( \pmb{b} \) is a constant valued vector and \( \pmb{X} \) is a RV with expected value \( \overline{\pmb{x}} \) and covariance \( \mathbf{B} \).
Then notice that,
\[ \begin{align} \mathbb{E}\left[ \pmb{X} + \pmb{b} \right] &= \mathbb{E}\left[\pmb{X} \right] + \pmb{b}\\ &= \overline{\pmb{x}} + \pmb{b} \end{align} \]
Therefore, we have that,
\[ \begin{align} \mathrm{cov}\left(\pmb{X} + \pmb{b}\right) &= \mathbb{E}\left[\left(\pmb{X} + \pmb{b} - \overline{\pmb{x}} - \pmb{b}\right)\left(\pmb{X} + \pmb{b} - \overline{\pmb{x}} - \pmb{b}\right)^\mathrm{T} \right]\\ &= \mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\mathrm{T} \right]\\ &= \mathrm{cov}\left(\pmb{X}\right) \end{align} \]
We have also discussed that
\[ \begin{align} \mathbb{E}\left[ \mathbf{A} \pmb{X} \right] &= \mathbf{A}\mathbb{E}\left[ \pmb{X}\right] \\ &= \mathbf{A} \overline{\pmb{x}} \end{align} \]
It follows as a direct consequence that,
\[ \begin{align} \mathrm{cov}\left(\mathbf{A}\pmb{X}\right)&= \mathbb{E}\left[\left(\mathbf{A}\pmb{X} - \mathbf{A}\overline{\pmb{x}} \right)\left(\mathbf{A}\pmb{X} - \mathbf{A}\overline{\pmb{x}} \right)^\mathrm{T} \right]\\ &=\mathbb{E}\left[\left\{ \mathbf{A} \left(\pmb{X} - \overline{\pmb{x}}\right)\right\} \left\{ \mathbf{A} \left(\pmb{X} - \overline{\pmb{x}} \right) \right\}^\mathrm{T} \right] \\ &= \mathbf{A}\mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}} \right)\left(\pmb{X} - \overline{\pmb{x}} \right)^\mathrm{T}\right] \mathbf{A}^\mathrm{T} \\ &=\mathbf{A}\mathrm{cov}\left(\pmb{X}\right)\mathbf{A}^\mathrm{T} \end{align} \]
These two properties show that the covariance is covariant with translations of the RV \( X \);
Recall our construction of the ensemble matrix \( \mathbf{E}\in\mathbb{R}^{N_x \times N_e} \):
Moreover, the sample mean can be computed from the row-average of the ensemble matrix as
\[ \hat{\pmb{X}} = \mathbf{E} \pmb{1} \frac{1}{N_e}. \]
We can thus define the sample covariance matrix in a way analogously to how we define the sample mean.
Particularly, if we follow the matrix multiplication with the transpose, we find that
\[ \begin{align} \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} \hat{X}_1 & \cdots & \hat{X}_{1} \\ \vdots & \ddots & \vdots \\ \hat{X}_{N_x} & \cdots &\hat{X}_{N_x} \end{pmatrix}\in\mathbb{R}^{N_x \times N_e} \end{align} \]
Particularly, this can be written column-wise as
\[ \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix}\hat{\pmb{X}}, \cdots, \hat{\pmb{X}}\end{pmatrix} \]
Using element-wise subtraction with the last identity, this says that,
\[ \begin{align} \mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} X_{1,1} - \hat{X}_1 & \cdots &X_{1,n}- \hat{X}_1 \\ \vdots & \ddots & \vdots \\ X_{N_x,1} - \hat{X}_{N_X} & \cdots & X_{N_X,N_e} - \hat{X}_{N_x} \end{pmatrix} \end{align} \]
With a re-normalization, we will define the matrix of perturbations or anomalies of the ensemble about the mean.
The (normalized) anomaly matrix
Let \( \mathbf{E} \) be the ensemble matrix as defined before. We define the (normalized) anomaly matrix of the ensemble as \[ \begin{align} \mathbf{X} :&= \left(\mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}}\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \end{align} \] In particular, \( \left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \) is sometimes referred to as the centering matrix.
The anomaly matrix above plays a central role in data assimilation to produce dimensional reductions in the computation.
Now recall, the sample variance of a (scalar) random sample \( \{X_{i,j}\}_{j=1}^{N_e} \) can simply be written as
\[ \begin{align} \hat{\sigma}_{i}^2 = \frac{1}{N_e - 1 } \sum_{j=1}^{N_e} \left(X_{i,j} - \hat{X}_{i}\right)^2 \end{align} \]
Similarly, the sample covariance of two RVs can be written as
\[ \begin{align} \hat{\sigma}_{i,j} = \frac{1}{N_e - 1 } \sum_{l=1}^{N_e} \left(X_{i,l} - \hat{X}_{i}\right)\left(X_{j,l} - \hat{X}_{j}\right). \end{align} \]
It is easy to demonstrate, using the above relationships, that the anomalies have the property
\[ \begin{align} \mathbf{P} :&= \mathbf{X} \mathbf{X}^\top \\ &= \mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{N_e -1}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\frac{1}{N_e -1} \end{align} \]
where \[ \begin{align} \mathbf{P}_{i,j} = \begin{cases} \hat{\sigma}^2_{i} &\text{ for }i=j\\ \hat{\sigma}_{i,j} &\text{ for }i\neq j \end{cases} \end{align} \]
The ensemble covariance matrix
Let \( \mathbf{X} \) be the anomalies matrix of the ensemble. The ensemble covariance matrix is defined by \[ \begin{align} \mathbf{P}:= \mathbf{X}\mathbf{X}^\top & & \mathbf{P}_{i,j} = \begin{cases} \hat{\sigma}^2_{i} &\text{ for }i=j\\ \hat{\sigma}_{i,j} &\text{ for }i\neq j \end{cases} \end{align} \] where \( \mathbb{E}\left[\mathbf{P}\right] = \mathbf{B} \), i.e., it is an unbiased sample estimator of the background covariance.
Note that the analogous definitions can be made for an observed ensemble matrix rather than a random ensemble matrix.
This is actually the standard, numerically stable / efficient means of computing a sample covariance matrix.
A key property we can see is that the anomalies are actually just the projection of the ensemble matrix into the orthogonal complement of the span of the vector of ones, \( \pmb{1} \).
This is the geometric interpretation of setting the mean equal to zero for the anomalies.
\[ \begin{align} \mathbf{X}\pmb{1} = \pmb{0} \end{align} \]
due to orthogonality.
Multivariate Gaussian
Let \( \pmb{X}\in\mathbb{R}^{N_x} \) be a RV with expected value \( \overline{\pmb{x}} \) and covariance \( \mathbf{B} \). The RV \( \pmb{X} \) is said to be distributed to the multivariate Gaussian distribution \( N(\overline{\pmb{x}}, \mathbf{B}) \) if it has a PDF defined \[ \begin{align} p(\pmb{x}) = \vert 2 \pi \mathbf{B}\vert^{-\frac{1}{2}} \exp\left\{\left(\pmb{x} - \overline{\pmb{x}}\right)^\top \mathbf{B}^{-1}\left(\pmb{x} - \overline{\pmb{x}}\right)\right\} \end{align} \] where for a square, non-singular matrix, \( \mathbf{A} \), \[ \begin{align} \vert \mathbf{A} \vert := \vert \mathrm{det}(\mathbf{A})\vert. \end{align} \]
Covariance matrices by construction are positive, semi-definite;
\[ \begin{align} \parallel \pmb{v}\parallel_\mathbf{B} := \sqrt{\pmb{v}^\top \mathbf{B}^{-1} \pmb{v}} \end{align} \]
defines an alternative distance to the Euclidean distance, weighted inversely proportionally to the spread of the distribution.
If a covariance is actually singlular, we can define a similar distance, but restricted to a lower-dimensional space;
Multivariate central limit theorem
Let \( \pmb{X}_1 ,\cdots , \pmb{X}_{N_e} \) be i.i.d. with expected value \( \overline{x} \) and covariance \( \mathbf{B} \) for all \( j = 1,\cdots, N_e \). Then the limiting form of the distribution for \[ N_e(\hat{\pmb{X}} − \overline{\pmb{x}}) \] as \( N_e \rightarrow \infty \) is \( N(\pmb{0}, \mathbf{B}) \) asymptotically. In particular, if \( \hat{\mathbf{B}} \) is any consistent estimator for \( \mathbf{B} \), we have moreover that the limiting distribution of \[ N_e\hat{\mathbf{B}}^{-\frac{1}{2}}(\hat{\pmb{X}} − \overline{\pmb{x}}) \] is the standard, multivariate normal \( N(\pmb{0},\mathbf{I}) \) as \( N_e \rightarrow \infty \).
The multivariate central limit theorem as above establishes the generality of the Gaussian approximation for the sampling distribution of the ensemble mean.
Likewise, this gives motivation to why the multivariate Gaussian will be used ubiquitously as an approximation.
This approximation may or may not be appropriate depending on the context;