A review of covariances and the multivariate Gaussian

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • The covariance between two random variables
    • The correlation between two random variables
    • The covariance matrix of a random vector
    • The ensemble covariance matrix
    • The multivariate Gaussian and central limit theorem

The covariance between two random variables

  • We have now introduced the expected value for a RV XX as the analog of the center of mass in multiple variables.

  • In one dimension, the notion of variance var(X)=σ2 and the standard deviation σ give us measures of the spread of the RV and the data derived from observations of it.

  • We define the variance of X once again in terms of,

    var(X)=σ2=E[(X¯x)2] which can be seen as the average deviation of the RV X from its mean, in the square sense.

  • When we have two RVs X and Y, we will need to take additional considerations of how these variables co-vary together or oppositely in their conditional probability.

    • This will be in the same sense of how they vary from their centers of mass, but simultaneously in space.

The covariance between two random variables

  • Consider that for the univariate expectation, with the two RVs X and Y, we have

    E[X+Y]=E[X]+E[Y]=¯x+¯y

  • However, the same does not apply when we take the variance of the sum of the variables;

    var(X+Y)=E[(X+Y¯x¯y)2]=E[{(X¯x)+(Y¯y)}2]=E[(X¯x)2+(Y¯y)2+2(X¯x)(Y¯y)]

  • Question: Using the linearity of the expectation, and the definition of the variance, how can the above be simplified?

    • Answer: Using that var(X)=E[(X¯x)2] and similarly in Y,

    var(X+Y)=var(X)+var(Y)+2E[(X¯x)(Y¯y)]

  • Therefore, the combination of the RVs has a variance that is equal to the sum of the variances plus the newly identified cross terms.

The ensemble covariance matrix

The ensemble covariance matrix
Let X be the anomalies matrix of the ensemble. The ensemble covariance matrix is defined by P:=XXPi,j={ˆσ2i for i=jˆσi,j for ij where E[P]=B, i.e., it is an unbiased sample estimator of the background covariance.
  • Note that the analogous definitions can be made for an observed ensemble matrix rather than a random ensemble matrix.

  • This is actually the standard, numerically stable / efficient means of computing a sample covariance matrix.

  • A key property we can see is that the anomalies are actually just the projection of the ensemble matrix into the orthogonal complement of the span of the vector of ones, 11.

    • The operator 1111 is precisely the orthogonal projector onto span{11}, such that (I1111) projects on its orthogonal complement.
  • This is the geometric interpretation of setting the mean equal to zero for the anomalies.

    • In particular,

    X11=00

    due to orthogonality.

    • Thus the rank (number of degrees of freedom) of the anomalies is actually Ne1, rather than the column dimension.

The multivariate Gaussian

  • With the definitions presented so far, we can now introduce the multivariate Gaussian distribution and the generalization of the central limit theorem.
Multivariate Gaussian
Let XXRNx be a RV with expected value ¯xx and covariance B. The RV XX is said to be distributed to the multivariate Gaussian distribution N(¯xx,B) if it has a PDF defined p(xx)=|2πB|12exp{(xx¯xx)B1(xx¯xx)} where for a square, non-singular matrix, A, |A|:=|det(A)|.
  • Covariance matrices by construction are positive, semi-definite;

    • when a covariance is full rank as above,

    vvB:=vvB1vv

    defines an alternative distance to the Euclidean distance, weighted inversely proportionally to the spread of the distribution.

  • If a covariance is actually singlular, we can define a similar distance, but restricted to a lower-dimensional space;

    • we will return to this at a later point.

The multivariate central limit theorem

  • We will finally introduce a fairly general form of the central limit theorem, extending the version presented earlier.
Multivariate central limit theorem
Let XX1,,XXNe be i.i.d. with expected value ¯x and covariance B for all j=1,,Ne. Then the limiting form of the distribution for Ne(^XX¯xx) as Ne is N(00,B) asymptotically. In particular, if ˆB is any consistent estimator for B, we have moreover that the limiting distribution of NeˆB12(^XX¯xx) is the standard, multivariate normal N(00,I) as Ne.
  • The multivariate central limit theorem as above establishes the generality of the Gaussian approximation for the sampling distribution of the ensemble mean.

  • Likewise, this gives motivation to why the multivariate Gaussian will be used ubiquitously as an approximation.

    • Suppose we replicate an experiment that is driven by, e.g., a physical law;
    • however, suppose we believe that each result has variation due to sums of small perturbations of noise;
    • then we can approximate the noise in the system as Gaussian variation around our deterministic laws.
  • This approximation may or may not be appropriate depending on the context;

    • however, we will demonstrate how wide classes of estimators in data assimilation can use this approximation to derive highly-numerically-scalable estimators.