A review of covariances and the multivariate Gaussian

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- The covariance between two random variables
- The correlation between two random variables
- The covariance matrix of a random vector
- The ensemble covariance matrix
- The multivariate Gaussian and central limit theorem

The covariance between two random variables

We have now introduced the expected value for a RV \( \pmb{X} \) as the analog of the center of mass in multiple variables.
In one dimension, the notion of variance \( \mathrm{var}\left(X\right)=\sigma^2 \) and the standard deviation \( \sigma \) give us measures of the spread of the RV and the data derived from observations of it.
We define the variance of \( X \) once again in terms of,

\[ \mathrm{var}\left(X\right) = \sigma^2 = \mathbb{E}\left[\left(X - \overline{x}\right)^2\right] \] which can be seen as the average deviation of the RV \( X \) from its mean, in the square sense.
When we have two RVs \( X \) and \( Y \), we will need to take additional considerations of how these variables co-vary together or oppositely in their conditional probability.
- This will be in the same sense of how they vary from their centers of mass, but simultaneously in space.

The covariance between two random variables

Consider that for the univariate expectation, with the two RVs \( X \) and \( Y \), we have

\[ \begin{align} \mathbb{E}\left[ X + Y \right] &= \mathbb{E}\left[X \right] + \mathbb{E}\left[ Y\right] \\ &=\overline{x} + \overline{y} \end{align} \]
However, the same does not apply when we take the variance of the sum of the variables;

\[ \begin{align} \mathrm{var}\left( X+Y\right) &= \mathbb{E}\left[ \left(X + Y - \overline{x} - \overline{y}\right)^2\right] \\ &=\mathbb{E}\left[\left\{ \left( X - \overline{x} \right) +\left( Y - \overline{y} \right) \right\}^2\right]\\ & = \mathbb{E}\left[ \left( X - \overline{x} \right)^2 + \left( Y - \overline{y} \right)^2 + 2 \left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right] \end{align} \]
Question: Using the linearity of the expectation, and the definition of the variance, how can the above be simplified?
- Answer: Using that \( \mathrm{var}\left(X\right) =\mathbb{E}\left[\left(X - \overline{x} \right)^2 \right] \) and similarly in \( Y \),
\[ \begin{align} \mathrm{var}\left( X+Y\right) &= \mathrm{var}\left(X\right) + \mathrm{var}\left(Y\right) + 2 \mathbb{E}\left[\left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right] \end{align} \]
Therefore, the combination of the RVs has a variance that is equal to the sum of the variances plus the newly identified cross terms.

The covariance between two random variables

We note that if \( X \) and \( Y \) are independent, i.e.,

\[ \begin{align} \mathcal{P}(X\vert Y) = \mathcal{P}(X) & & \mathcal{P}(Y \vert X) = \mathcal{P}(Y); \end{align} \]
then we have \[ \begin{align} \mathbb{E}\left[\left(X - \overline{x}\right) \left(Y - \overline{y} \right)\right] = \mathbb{E}\left[X - \overline{x}\right] \mathbb{E}\left[Y - \overline{y} \right] = 0. \end{align} \]
Therefore, we can consider the covariance,

\[ \mathrm{cov}\left(X,Y\right) = \sigma_{X,Y} = \mathbb{E}\left[\left(X - \overline{x} \right)\left(Y - \overline{y}\right)\right], \] to be a measure of how the variables \( X \) and \( Y \) co-vary together in their conditional probabilities.
We should note that while \( \mathrm{cov}\left(X,Y\right)=0 \) for any pair of independent variables, this condition is not the same as independence in general.

The correlation between two random variables

Particularly, we will denote

\[ \begin{align} \mathrm{cor}(X,Y) =\rho_{X,Y} = \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathrm{var}\left(X\right) \mathrm{var}\left(Y\right)}}=\frac{\sigma_{X,Y}}{\sqrt{\sigma_{X}^2 \sigma_{Y}^2}} = \frac{\sigma_{X,Y}}{\sigma_{X} \sigma_{Y}} \end{align} \] the correlation between the variables \( X \) and \( Y \).
If the correlation / covariance of the two variables \( X \) and \( Y \) is equal to zero, then

\[ \mathrm{var}\left( X+Y\right) = \mathrm{var}\left(X\right) + \mathrm{var}\left(Y\right), \] but this does not imply that they are independent, just that we cannot detect the dependence structure with this measure.
Question: how can you use the above definition of the correlation to show that \( X \) always has correlation \( 1 \) with itself?
- Answer: notice that the variance of \( X \), \( \sigma_X^2 \), and the standard deviation, \( \sigma_X \), can be substituted into the above to obtain,
\[ \mathrm{cor}(X,X) =\rho_{X,X} = \frac{\mathrm{cov}(X,X)}{\sqrt{\mathrm{var}\left(X\right) \mathrm{var}\left(X\right)}}=\frac{\sigma_{X}^2}{\sqrt{\sigma_{X}^2 \sigma_{X}^2}}= 1 \]

The correlation between two random variables

More generally, we can say that for any two RVs \( X \) and \( Y \),

\[ -1 \leq \mathrm{cor}\left(X,Y\right)\leq 1. \]
This can be shown as follows, where

\[ \begin{align} 0 & \leq \mathrm{var}\left( \frac{X}{\sigma_X} + \frac{Y}{\sigma_Y} \right) \\ &=\mathrm{var}\left(\frac{X}{\sigma_X}\right) + \mathrm{var}\left(\frac{Y}{\sigma_Y}\right) + 2\mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \end{align} \] using the relationship we have just shown.
We note that when we divide a RV by its standard deviation, the variance becomes one;
- therefore,
\[ \begin{align} & 0 \leq 1 + 1 +2 \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \\ \Leftrightarrow & -1\leq \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) \end{align} \]

The correlation between two random variables

Let's recall that we just showed,

\[ -1\leq \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) . \]
Let's note that, \( \mathbb{E}\left[ \frac{X}{\sigma_X} \right] = \frac{\overline{x}}{\sigma_X} \) so that \[ \begin{align} \mathrm{cov}\left(\frac{X}{\sigma_X},\frac{Y}{\sigma_Y}\right) &= \mathbb{E}\left[\left(\frac{X -\overline{x}}{\sigma_X}\right)\left(\frac{Y - \overline{y}}{\sigma_Y}\right)\right] \\ &= \frac{\mathbb{E}\left[\left(X -\overline{x}\right)\left(Y - \overline{y} \right) \right]}{\sigma_X \sigma_Y}\\ &= \frac{\sigma_{XY}}{\sigma_X \sigma_Y} \\ &= \mathrm{cor}(X,Y) \end{align} \]
Using the two statements above, we have \[ \begin{align} \Leftrightarrow & -1 \leq \mathrm{cor}\left(X,Y\right) \end{align} \]
If we repeat the above argument with \( -X \) in the place of \( X \), we will get the statement \( \mathrm{cor}\left(X,Y\right) \leq 1 \) to complete the argument.

The correlation between two random variables

In the last slide we showed how we can identify,

\[ -1 \leq \mathrm{cor}\left(X,Y\right)\leq 1 \] for any pair of RVs \( X \) and \( Y \).
With the above range in mind, we say that a correlation of “close-to-one” means that the variables \( X \) and \( Y \) vary together almost identically;
- an increase in \( X \) corresponds almost identically to a proportional increase in \( Y \).
Conversely, a correlation of “close-to-negative-one” means that the variables \( X \) and \( Y \) vary together almost identically oppositely;
- and increase in \( X \) corresponds almost identically to a proportionally decrease in \( Y \).
This can be understood similarly by taking the \( \mathrm{cov}\left(-X,X\right) \);
- notice that
\[ \begin{align} \mathrm{cov}\left(-X, X\right) &= \mathbb{E}\left[\left(-X - (-\overline{x}) \right)\left( X - \overline{x}\right) \right] \\ &= - \mathbb{E}\left[\left( X - \overline{x}\right)^2\right]\\ &= - \mathrm{cov}(X,X) \end{align} \]
It is easy to show then that \( \mathrm{cor}(-X,X) = -1 \).

The covariance matrix for a random vector

We suppose now that we have a RV, \( \pmb{X}\sim P \) where each component is a RV,

\[ \begin{align} \pmb{X} = \begin{pmatrix} X_1 \\ \vdots \\ X_{N_x} \end{pmatrix} \in \mathbb{R}^{N_x}. \end{align} \]
For each component RV, we may similarly define,

\[ \begin{align} \mathrm{var}\left(X_i\right) &= \mathbb{E}\left[ \left(X_i - \overline{x}_i\right)^2 \right] \\ \mathrm{cov}\left(X_i, X_j \right) &= \mathbb{E}\left[ \left(X_i - \overline{x}_i\right) \left(X_j - \overline{x}_j\right) \right] \end{align} \] as we did for \( X \) and \( Y \).
The component-wise definition above is convenient in how it extends from the simple discussion before;
- however, algebraically and computationally, this becomes much simpler to define in terms of the vector outer product.

The covariance matrix for a random vector

Recall the Euclidean norm of an arbitrary vector is defined as

\[ \parallel \pmb{v}\parallel = \sqrt{\pmb{v}^\mathrm{T} \pmb{v}} \] gives the general form for a distance in arbitrarily large dimensions.
Notice it is defined in terms of the vector inner product, where

\[ \pmb{v}^\mathrm{T}\pmb{v} =\begin{pmatrix}v_1 & \cdots & v_{N_x} \end{pmatrix} \begin{pmatrix}v_1 \\ \vdots \\ v_{N_x}\end{pmatrix} = \sum_{i=1}^{N_x} v_i^2 \]

The covariance matrix for a random vector

If we instead change the order of the transpose, we obtain the outer product as

\[ \begin{align} \pmb{v}\pmb{v}^\mathrm{T}& = \begin{pmatrix} v_1 \\ \vdots \\ v_{N_x} \end{pmatrix} \begin{pmatrix}v_1 & \cdots & v_{N_x}\end{pmatrix} \\ &= \begin{pmatrix} v_1 v_1 & v_1 v_2 & \cdots & v_1 v_{N_x} \\ v_2 v_1 & v_2 v_2 & \cdots & v_2 v_{N_x} \\ \vdots & \vdots & \ddots & \vdots \\ v_{N_x} v_1 & v_{N_x} v_2 & \cdots & v_{N_x} v_{N_x} \end{pmatrix}, \end{align} \] which is instead matrix valued in the output.

The covariance matrix for a random vector

When we extend the notion of the covariance to a RV \( \pmb{X} \), finding the variances and the covariances of all of its entries, we arrive at the notion of covariance using the outer product.
Particularly, suppose that \( \mathbb{E}\left[\pmb{X}\right] = \overline{\pmb{x}} \); then we write

\[ \begin{align} \mathrm{cov}\left(\pmb{X}\right) = \mathbf{B} = \mathbb{E}\left[\left(\pmb{X}-\overline{\pmb{x}}\right) \left(\pmb{X} - \overline{\pmb{x}} \right)^\mathrm{T} \right] \end{align} \]
- \( \mathbf{B} \) in data assimilation is sometimes called the background covariance to distinguish this from an empirical, ensemble-based covariance.
That is, if \( \{\pmb{X}_j\}_{j=1}^{N_e} \) is a random sample with parent distribution \( \pmb{X} \sim P \), the background covariance represents the population covariance which \( \pmb{X}_j \) is distributed according to.
Using the previous outer product formula, we obtain the product

\[ \begin{align} \left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\mathrm{T} &= \begin{pmatrix} \left(X_1 - \overline{x}_1\right)\left(X_1 - \overline{x}_1 \right) & \cdots & \left(X_1 - \overline{x}_1 \right) \left(X_{N_x} - \overline{x}_{N_x} \right) \\ \vdots & \ddots & \vdots \\ \left(X_{N_x} - \overline{x}_{N_x} \right)\left(X_1 - \overline{x}_1 \right)& \cdots & \left(X_{N_x} - \overline{x}_{N_x} \right)\left(X_{N_x} - \overline{x}_{N_x} \right) \end{pmatrix}. \end{align} \]

The covariance matrix for a random vector

With the last formula, we can derive a general form for the covariance matrix.

The (background) covariance matrix
Let \( \pmb{X}\sim P \) be a RV with mean \( \mathbb{E}\left[\pmb{X}\right] = \overline{\pmb{x}} \). The (background) covariance matrix is defined \[ \begin{align} \mathbf{B} = \mathrm{cov}(\pmb{X}) := \mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\top\right] & & & & \mathbf{B}_{ij} = \begin{cases} \mathrm{var}\left( X_i\right) & & \text{when }i=j \\ \mathrm{cov}\left(X_i,X_j\right) & & \text{when } i \neq j \end{cases} \end{align} \]

The above covariances and variances are to be understood in the same sense as in the univariate discussion, but for the component RVs \( X_i \) and \( X_j \).
Note, the covariance \( \mathrm{cov}\left(X_i, X_j\right) = \mathrm{cov}\left(X_j, X_i\right) \) is symmetric;
- therefore, \( \mathbf{B} \) enjoys all of the properties of the spectral theorem.
Furthermore, the eigenvalues of \( \mathbf{B} \) are all non-negative in general.
If the component RVs \( X_i,X_j \) are uncorrelated, \( \mathbf{B} \) is also diagonal,

\[ \mathbf{B} = \begin{pmatrix} \mathrm{var}(X_1) & 0 & \cdots & 0 \\ 0 & \mathrm{var}(X_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & \cdots & \mathrm{var}(X_{N_x}) \end{pmatrix} \] and the eigenvalues are identically the variances.

The covariance matrix for a random vector

Some basic properties of the covariance follow immediately from the linearity of the expectation over sums.
Suppose that \( \mathbf{A} \) is a constant valued matrix, \( \pmb{b} \) is a constant valued vector and \( \pmb{X} \) is a RV with expected value \( \overline{\pmb{x}} \) and covariance \( \mathbf{B} \).
Then notice that,

\[ \begin{align} \mathbb{E}\left[ \pmb{X} + \pmb{b} \right] &= \mathbb{E}\left[\pmb{X} \right] + \pmb{b}\\ &= \overline{\pmb{x}} + \pmb{b} \end{align} \]
Therefore, we have that,

\[ \begin{align} \mathrm{cov}\left(\pmb{X} + \pmb{b}\right) &= \mathbb{E}\left[\left(\pmb{X} + \pmb{b} - \overline{\pmb{x}} - \pmb{b}\right)\left(\pmb{X} + \pmb{b} - \overline{\pmb{x}} - \pmb{b}\right)^\mathrm{T} \right]\\ &= \mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}}\right)\left(\pmb{X} - \overline{\pmb{x}}\right)^\mathrm{T} \right]\\ &= \mathrm{cov}\left(\pmb{X}\right) \end{align} \]

The covariance matrix for a random vector

We have also discussed that

\[ \begin{align} \mathbb{E}\left[ \mathbf{A} \pmb{X} \right] &= \mathbf{A}\mathbb{E}\left[ \pmb{X}\right] \\ &= \mathbf{A} \overline{\pmb{x}} \end{align} \]
It follows as a direct consequence that,

\[ \begin{align} \mathrm{cov}\left(\mathbf{A}\pmb{X}\right)&= \mathbb{E}\left[\left(\mathbf{A}\pmb{X} - \mathbf{A}\overline{\pmb{x}} \right)\left(\mathbf{A}\pmb{X} - \mathbf{A}\overline{\pmb{x}} \right)^\mathrm{T} \right]\\ &=\mathbb{E}\left[\left\{ \mathbf{A} \left(\pmb{X} - \overline{\pmb{x}}\right)\right\} \left\{ \mathbf{A} \left(\pmb{X} - \overline{\pmb{x}} \right) \right\}^\mathrm{T} \right] \\ &= \mathbf{A}\mathbb{E}\left[\left(\pmb{X} - \overline{\pmb{x}} \right)\left(\pmb{X} - \overline{\pmb{x}} \right)^\mathrm{T}\right] \mathbf{A}^\mathrm{T} \\ &=\mathbf{A}\mathrm{cov}\left(\pmb{X}\right)\mathbf{A}^\mathrm{T} \end{align} \]
These two properties show that the covariance is covariant with translations of the RV \( X \);
- however, the covariance is propagated with a conjugate product of \( \mathbf{A} \) and \( \mathbf{A}^\top \) when the random variable is propagated with the linear transformation \( \mathbf{A} \).

The ensemble covariance matrix

Recall our construction of the ensemble matrix \( \mathbf{E}\in\mathbb{R}^{N_x \times N_e} \):
- We will suppose that we have a random sample \( \pmb{X}_j \) following a parent distribution \( \pmb{X}\sim P \);
- The ensemble matrix is given such that \( \mathbf{E}^j = \pmb{X}_j \) for all \( j = 1,\cdots,N_e \).
Moreover, the sample mean can be computed from the row-average of the ensemble matrix as

\[ \hat{\pmb{X}} = \mathbf{E} \pmb{1} \frac{1}{N_e}. \]
We can thus define the sample covariance matrix in a way analogously to how we define the sample mean.
Particularly, if we follow the matrix multiplication with the transpose, we find that

\[ \begin{align} \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} \hat{X}_1 & \cdots & \hat{X}_{1} \\ \vdots & \ddots & \vdots \\ \hat{X}_{N_x} & \cdots &\hat{X}_{N_x} \end{pmatrix}\in\mathbb{R}^{N_x \times N_e} \end{align} \]
Particularly, this can be written column-wise as

\[ \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix}\hat{\pmb{X}}, \cdots, \hat{\pmb{X}}\end{pmatrix} \]

The ensemble covariance matrix

Using element-wise subtraction with the last identity, this says that,

\[ \begin{align} \mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} X_{1,1} - \hat{X}_1 & \cdots &X_{1,n}- \hat{X}_1 \\ \vdots & \ddots & \vdots \\ X_{N_x,1} - \hat{X}_{N_X} & \cdots & X_{N_X,N_e} - \hat{X}_{N_x} \end{pmatrix} \end{align} \]
With a re-normalization, we will define the matrix of perturbations or anomalies of the ensemble about the mean.

The (normalized) anomaly matrix
Let \( \mathbf{E} \) be the ensemble matrix as defined before. We define the (normalized) anomaly matrix of the ensemble as \[ \begin{align} \mathbf{X} :&= \left(\mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}}\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \end{align} \] In particular, \( \left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \) is sometimes referred to as the centering matrix.

The anomaly matrix above plays a central role in data assimilation to produce dimensional reductions in the computation.
- We will return to this shortly.

The ensemble covariance matrix

Now recall, the sample variance of a (scalar) random sample \( \{X_{i,j}\}_{j=1}^{N_e} \) can simply be written as

\[ \begin{align} \hat{\sigma}_{i}^2 = \frac{1}{N_e - 1 } \sum_{j=1}^{N_e} \left(X_{i,j} - \hat{X}_{i}\right)^2 \end{align} \]
Similarly, the sample covariance of two RVs can be written as

\[ \begin{align} \hat{\sigma}_{i,j} = \frac{1}{N_e - 1 } \sum_{l=1}^{N_e} \left(X_{i,l} - \hat{X}_{i}\right)\left(X_{j,l} - \hat{X}_{j}\right). \end{align} \]
It is easy to demonstrate, using the above relationships, that the anomalies have the property

\[ \begin{align} \mathbf{P} :&= \mathbf{X} \mathbf{X}^\top \\ &= \mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{N_e -1}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\frac{1}{N_e -1} \end{align} \]

where \[ \begin{align} \mathbf{P}_{i,j} = \begin{cases} \hat{\sigma}^2_{i} &\text{ for }i=j\\ \hat{\sigma}_{i,j} &\text{ for }i\neq j \end{cases} \end{align} \]

The ensemble covariance matrix

The ensemble covariance matrix
Let \( \mathbf{X} \) be the anomalies matrix of the ensemble. The ensemble covariance matrix is defined by \[ \begin{align} \mathbf{P}:= \mathbf{X}\mathbf{X}^\top & & \mathbf{P}_{i,j} = \begin{cases} \hat{\sigma}^2_{i} &\text{ for }i=j\\ \hat{\sigma}_{i,j} &\text{ for }i\neq j \end{cases} \end{align} \] where \( \mathbb{E}\left[\mathbf{P}\right] = \mathbf{B} \), i.e., it is an unbiased sample estimator of the background covariance.

Note that the analogous definitions can be made for an observed ensemble matrix rather than a random ensemble matrix.
This is actually the standard, numerically stable / efficient means of computing a sample covariance matrix.
A key property we can see is that the anomalies are actually just the projection of the ensemble matrix into the orthogonal complement of the span of the vector of ones, \( \pmb{1} \).
- The operator \( \pmb{1}\pmb{1}^\top \) is precisely the orthogonal projector onto \( \mathrm{span}\{\pmb{1}\} \), such that \( (\mathbf{I} - \pmb{1}\pmb{1}^\top) \) projects on its orthogonal complement.
This is the geometric interpretation of setting the mean equal to zero for the anomalies.
- In particular,
\[ \begin{align} \mathbf{X}\pmb{1} = \pmb{0} \end{align} \]

due to orthogonality.
- Thus the rank (number of degrees of freedom) of the anomalies is actually \( N_e -1 \), rather than the column dimension.

The multivariate Gaussian

With the definitions presented so far, we can now introduce the multivariate Gaussian distribution and the generalization of the central limit theorem.

Multivariate Gaussian
Let \( \pmb{X}\in\mathbb{R}^{N_x} \) be a RV with expected value \( \overline{\pmb{x}} \) and covariance \( \mathbf{B} \). The RV \( \pmb{X} \) is said to be distributed to the multivariate Gaussian distribution \( N(\overline{\pmb{x}}, \mathbf{B}) \) if it has a PDF defined \[ \begin{align} p(\pmb{x}) = \vert 2 \pi \mathbf{B}\vert^{-\frac{1}{2}} \exp\left\{\left(\pmb{x} - \overline{\pmb{x}}\right)^\top \mathbf{B}^{-1}\left(\pmb{x} - \overline{\pmb{x}}\right)\right\} \end{align} \] where for a square, non-singular matrix, \( \mathbf{A} \), \[ \begin{align} \vert \mathbf{A} \vert := \vert \mathrm{det}(\mathbf{A})\vert. \end{align} \]

Covariance matrices by construction are positive, semi-definite;
- when a covariance is full rank as above,
\[ \begin{align} \parallel \pmb{v}\parallel_\mathbf{B} := \sqrt{\pmb{v}^\top \mathbf{B}^{-1} \pmb{v}} \end{align} \]

defines an alternative distance to the Euclidean distance, weighted inversely proportionally to the spread of the distribution.
If a covariance is actually singlular, we can define a similar distance, but restricted to a lower-dimensional space;
- we will return to this at a later point.

The multivariate central limit theorem

We will finally introduce a fairly general form of the central limit theorem, extending the version presented earlier.

Multivariate central limit theorem
Let \( \pmb{X}_1 ,\cdots , \pmb{X}_{N_e} \) be i.i.d. with expected value \( \overline{x} \) and covariance \( \mathbf{B} \) for all \( j = 1,\cdots, N_e \). Then the limiting form of the distribution for \[ N_e(\hat{\pmb{X}} − \overline{\pmb{x}}) \] as \( N_e \rightarrow \infty \) is \( N(\pmb{0}, \mathbf{B}) \) asymptotically. In particular, if \( \hat{\mathbf{B}} \) is any consistent estimator for \( \mathbf{B} \), we have moreover that the limiting distribution of \[ N_e\hat{\mathbf{B}}^{-\frac{1}{2}}(\hat{\pmb{X}} − \overline{\pmb{x}}) \] is the standard, multivariate normal \( N(\pmb{0},\mathbf{I}) \) as \( N_e \rightarrow \infty \).

The multivariate central limit theorem as above establishes the generality of the Gaussian approximation for the sampling distribution of the ensemble mean.
Likewise, this gives motivation to why the multivariate Gaussian will be used ubiquitously as an approximation.
- Suppose we replicate an experiment that is driven by, e.g., a physical law;
- however, suppose we believe that each result has variation due to sums of small perturbations of noise;
- then we can approximate the noise in the system as Gaussian variation around our deterministic laws.
This approximation may or may not be appropriate depending on the context;
- however, we will demonstrate how wide classes of estimators in data assimilation can use this approximation to derive highly-numerically-scalable estimators.