Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
We will now introduce the basic tools of statistics and probability theory for multivariate analysis;
Some notions like the expected / center of mass will translate directly over linear combinations of RVs.
Recall, for RVs \( X,Y \) and a constant scalars \( a,b \) we have \[ \mathbb{E}\left[ a X + b Y\right] = a \mathbb{E}\left[X\right] + b \mathbb{E}\left[Y\right] \] by the linearity of the expectation.
Recall the vector notation \( \pmb{x} \in \mathbb{R}^{N_x} \), matrix notation \( \mathbf{A} \in \mathbb{R}^{N_x \times N_x} \), and matrix-slice notation \( \mathbf{A}^j \in \mathbb{R}^{N_x} \) where
\[ \begin{align} \pmb{x} := \begin{pmatrix} x_1 \\ \vdots \\ x_{N_x} \end{pmatrix} & & \mathbf{A} := \begin{pmatrix} a_{1,1} & \cdots & a_{1, N_x} \\ \vdots & \ddots & \vdots \\ a_{N_x,1} & \cdots & a_{N_x, N_x} \end{pmatrix} & & \mathbf{A}^j := \begin{pmatrix} a_{1,j} \\ \vdots \\ a_{N_x, j} \end{pmatrix} \end{align} \]
The linearity of the expectation then extends to random vectors \( \pmb{x}, \pmb{y} \) and constant matrices \( \mathbf{A},\mathbf{B} \), we can write \[ \mathbb{E}\left[\mathbf{A}\pmb{x} + \mathbf{B}\pmb{y} \right] = \mathbf{A}\mathbb{E}\left[ \pmb{x}\right] + \mathbf{B}\mathbb{E}\left[\pmb{y}\right]. \]
While the concept of the center of mass has a direct generalization to vectors, we will need to make some additional considerations when we measure the spread of RVs and how they relate to others.
The extension of the second centered moment has the generalization to the rotational inertia tensor.
We will begin our consideration in \( N_x=2 \) dimensions, as all properties described in the following will extend (with minor modifications) to arbitrarily large but finite \( N_x \).
Let the random vector \( \pmb{X} \) be defined as
\[ \begin{align} \pmb{X} = \begin{pmatrix} X_1 \\ X_2 \end{pmatrix} \end{align} \] where each of the above components \( X_i \) is a RV.
We can define the cumulative distribution function in a similar way to the definition in one variable.
Let \( x_1,x_2 \) be two fixed real values forming a constant vector as
\[ \pmb{x} = \begin{pmatrix} x_1 \\ x_2\end{pmatrix}. \]
Define the comparison operator between two vectors \( \pmb{y}, \pmb{x} \) as
\[ \pmb{y} \leq \pmb{x} \Leftrightarrow y_i \leq x_i \text{ for each and every }i \]
Multivariate cumulative distribution function
The cumulative distribution function \( P \), describing the probability of realizations of \( \pmb{X} \), is defined \[ P(\pmb{x}) = \mathcal{P}(\pmb{X}\leq \pmb{x} ) = \mathcal{P}(X_i \leq x_i \quad \forall i=1,\cdots,N_x). \]
Recall that the CDF
\[ \begin{align} P:\mathbb{R}^2 & \rightarrow [0,1] \\ \pmb{x} & \rightarrow \mathcal{P}(\pmb{X}\leq \pmb{x}) \end{align} \] is a function of the variables \( (x_1,x_2) \).
Suppose then that \( P \) has continuous second partial derivatives in \( \partial_{x_1} \partial_{x_2}P = \partial_{x_2}\partial_{x_1}P \);
Multivariate probability density function
Let \( P\in \mathcal{C}^2\left(\mathbb{R}^2\right) \) , then the probability density function \( p \) is defined as \[ \begin{align} p:\mathbb{R}^2 & \rightarrow \mathbb{R}\\ \pmb{x} &\rightarrow \partial_{x_1}\partial_{x_2}P(\pmb{x}) \end{align} \]
In the above definition, we have constructed the density function in the same way as in one variable;
\[ \begin{align} P(\pmb{x}) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2. \end{align} \]
Recall that the relationship between the CDF and the PDF as
\[ \begin{align} P(\pmb{x}) = \int_{-\infty}^{x_1} \int_{-\infty}^{x_2} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2; \end{align} \]
this implies that
\[ \begin{align} P(- \infty < \pmb{X} < \infty) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2 = 1 \end{align} \]
If we define this over \( N_x\geq 2 \) variables, all of the above extends identically when \( P \) has derivatives defined in arbitrary arrangements of the \( N_x \)-th partial derivatives in each \( \partial_{x_i} \).
Specifically, this requires that, for all permutations \( \psi:(1, \cdots, N_x) \rightarrow (\psi(1), \cdots ,\psi(N_x)) \),
\[ \begin{align} \partial_{x_1} \cdots \partial_{x_{N_x}}P = \partial_{x_{\psi(1)}} \cdots \partial_{x_{\psi(N_x)}}P \end{align} \]
Using the integral relationship, we can similarly show
\[ \begin{align} P(\pmb{x}) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_{N_x}} p(s_1, \cdots, s_{N_x}) \mathrm{d}s_1\cdots \mathrm{d}s_{N_x}. \end{align} \]
Courtesy of: F.M. Dekking, et al. A Modern Introduction to Probability and Statistics. Springer Science & Business Media, 2005.
Courtesy of: F.M. Dekking, et al. A Modern Introduction to Probability and Statistics. Springer Science & Business Media, 2005.
A new construction arises when we consider the joint probability of multiple variables, and how these relate back to univariate probability.
Consider if we write the CDF of \( \pmb{X} = (X_1,X_2)^\top \) in terms of its component arguments,
\[ \begin{align} P(x_1, x_2) = \int_{-\infty}^{x_1}\int_{-\infty}^{x_2} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2 \end{align} \] we can also define the limit as \( x_i \) goes to infinity.
This can be considered as an operation of averaging out the variable \( x_i \) as represented by the integral
\[ \begin{align} \lim_{x_2\rightarrow \infty}\int_{-\infty}^{x_1}\int_{-\infty}^{x_2} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2 = \int_{-\infty}^{x_1}\int_{-\infty}^{\infty} p(s_1, s_2) \mathrm{d}s_1 \mathrm{d}s_2 \end{align} \]
However, if we consider \( x_1 \) to still remain a free variable in the above equation, this can be written as,
\[ \begin{align} P(x_1) = P(x_1, \infty) = \lim_{x_2 \rightarrow \infty} P(x_1,x_2) \end{align} \] where \( P(x_1) \) is the marginal distribution function for \( x_1 \), integrating our \( x_2 \) in the same sense as above.
Likewise we can define the marginal distribution function symmetrically in \( x_2 \) as,
\[ \begin{align} P(x_2) = P(\infty , x_2) = \lim_{x_1 \rightarrow \infty} P(x_1,x_2) \end{align} \] where \( P(x_2) \) is the marginal distribution function for \( x_2 \) integrating our \( x_1 \) in the same sense as above.
Let us consider now the meaning of this object in terms of simple probability.
Let \( A =\{X_1 \leq x_1\} \) and \( B = \{X_2\leq x_2\} \) so that,
\[ \begin{align} P(x_1) &:= \lim_{x_2\rightarrow \infty} P(x_1, x_2) \\ &= \lim_{x_2\rightarrow \infty} \mathcal{P}(A \cap B) \\ &= \lim_{x_2 \rightarrow\infty} \mathcal{P}(A \vert B) \mathcal{P}(B)\\ &= \mathcal{P}(A) \end{align} \] as \( X_2 \leq \infty \) places no restriction on the event \( A \), and \( \mathcal{P}(B)=1 \) for \( x_2 \rightarrow \infty \).
Similarly, we can show this for the marginal distribution in \( x_2 \).
Therefore, we associate the marginal distribution of \( x_1 \) with its intrinsic probability distribution, after factoring out the outcomes of possibly related variables.
Geometrically, we can gain some understanding of the marginal probability distribution terms of the marginal density function.
Courtesy of Bscan, CC0, via Wikimedia Commons
Let us recall now that \( \mathcal{P}(A \cap B) = \mathcal{P}(A)\mathcal{P}(B) \) if and only if \( A \) and \( B \) are independent events.
Let \( A =\{X_1 \leq x_1\} \) and \( B = \{X_2\leq x_2\} \) once again.
Consider thus, if \( X_1 \) and \( X_2 \) are independent RVs, we must have that
\[ \begin{align} P(x_1, x_2) &= \mathcal{P}(A \cap B) \\ &= \mathcal{P}(A) \mathcal{P}(B) \\ &=P(x_1) P(x_2) \end{align} \] from our earlier derivations.
If we take the derivatives \( \partial_{x_1}\partial_{x_2} \) of the above, assuming again the independence,
\[ \begin{align} \partial_{x_1}\partial_{x_2} P(x_1,x_2) &= p(x_1,x_2) \\ & = \partial_{x_1} P(x_1) \partial_{x_2} P(x_2)\\ &= p(x_1) p(x_2) \end{align} \] which can be shown by integrating out either \( x_i \) from the joint density.
Therefore, for independent RVs \( X_1,X_2 \) we recover the following identities on the marginal functions,
\[ \begin{align} P(x_1, x_2) = P(x_1)P(x_2) & & p(x_1,x_2) = p(x_1) p(x_2)\end{align} \]
The marginals are closely related to the notion of the conditional probability.
We recall, it is a basic assumption in probability that when \( \mathcal{P}(B)\neq0 \),
\[ \mathcal{P}(A\vert B) = \frac{\mathcal{P}(A\cap B)}{\mathcal{P}(B)}. \]
We will thus define the conditional density as,
\[ p(x_1 \vert x_2) = \frac{p(x_1, x_2)}{p(x_2)} \]
We can thus write equivalently,
\[ p(\pmb{x}) := p(x_1,x_2) = p(x_1 \vert x_2) p(x_2) \]
When \( X_1,X_2 \) are independent, we furthermore obtain \( p(x_1 \vert x_2) = p(x_1) \).
The following relationship between the marginal, conditional and joint densities
\[ p(\pmb{x}) = p(x_1 \vert x_2) p(x_2) \]
can be used to illustrate another connection.
Consider,
\[ \begin{align} p(x_1) &= \int_{-\infty}^\infty p(\pmb{x}) \mathrm{d}x_2 \\ &= \int_{-\infty}^\infty p(x_1 \vert x_2) p(x_2) \mathrm{d}x_2 \end{align} \]
Then we have the interpretation of the marginal \( p(x_1) \) as the average value of the conditional density \( p(x_1 \vert x_2) \), as we take the weighted average over all possible \( x_2 \).
This is equivalent to the earlier interpretation, summing the total frequency of all \( x_2 \) observed for a fixed value of \( x_1 \).
In multivariate statistics, a random sample is defined as a sequence of i.i.d. RVs
\[ \pmb{X}_1 , . . . , \pmb{X}_{N_e} \in \mathbb{R}^{N_x}, \] where each \( \pmb{X}_i \) has the same distribution as a parent or population RV \( \pmb{X} \sim P \).
Observations of the random sample are the values the RVs \( \{\pmb{X}_j\}_{j=1}^{N_e} \) attain, i.e., fixed vectors of numerical outcomes \( \{\pmb{x}_j\}_{j=1}^{N_e} \).
In data assimilation, it is typical to collect the random sample into what is known as an ensemble matrix.
The (random) ensemble matrix
Let \( \{\pmb{X}_j\}_{j=1}^{N_e} \) be RVs independently and identically distributed (iid) according to a parent distribution \( \pmb{X}\sim P \). The (random) ensemble matrix is defined \[ \begin{align} \mathbf{E} := \begin{pmatrix} \pmb{X}_1, \cdots, \pmb{X}_{N_e}\end{pmatrix}, & & \mathbf{E}^j := \pmb{X}_j, & & \mathbf{E}\in \mathbb{R}^{N_x \times N_e}, \end{align} \] where \( N_x \) is referred to as the state dimension and \( N_e \) is referred to as the ensemble dimension.
Notice that the ensemble matrix places distinct RVs in distinct rows, while replicates of the outcomes are positioned in columns.
This can be thought of similarly to the transpose of the design matrix from linear regression.
The (observed) ensemble matrix
Let \( \{\pmb{X}_j\}_{j=1}^{N_e} \) be RVs independently and identically distributed (iid) according to a parent distribution \( \pmb{X}\sim P \), and \( \{\pmb{x}_j\}_{j=1}^{N_e} \) be some realization of the random sample. The (observed) ensemble matrix is defined \[ \begin{align} \mathbf{E} := \begin{pmatrix} \pmb{x}_1, \cdots, \pmb{x}_{N_e}\end{pmatrix}, & & \mathbf{E}^j := \pmb{x}_j, & & \mathbf{E}\in \mathbb{R}^{N_x \times N_e}. \end{align} \]
We will use the the ensemble matrix notation interchangably;
We mentioned already that the expectation has the same property of passing over linear combinations of RVs in the same ways with vectors as with scalars.
Particularly, we consider the following definition:
\[ \begin{align} \mathbb{E}\left[\pmb{X}\right] = \int_{\mathbb{R}^{N_x}} \mathbf{s} p(\mathbf{s})\mathrm{d}\mathbf{s}= \begin{pmatrix} \mathbb{E}\left[X_1\right] \\ \vdots \\ \mathbb{E}\left[X_{N_x}\right] \end{pmatrix} = \begin{pmatrix} \int_{-\infty}^\infty s_1 p(s_1) \mathrm{d}s_1 \\ \vdots \\ \int_{-\infty}^\infty s_{N_x} p(s_{N_x}) \mathrm{d}s_{N_x} \end{pmatrix} = \begin{pmatrix} \overline{x}_1 \\ \vdots \\ \overline{x}_{N_x}\end{pmatrix} = \overline{\pmb{x}} \end{align} \] where in the above \( p(x_i) \) refers to the marginal for \( X_i \) having integrated out the other variables.
In the form \( \mathbb{E}\left[\pmb{X}\right] = \int_{\mathbb{R}^{N_x}} \mathbf{s} p(\mathbf{s})\mathrm{d}\mathbf{s} \) this can be seen as the weighted average of the vector \( \mathbf{s} \), given the weights of the joint density.
Suppose again \( N_x=2 \), then we can then consider the first entry of \( \mathbb{E}\left[\pmb{X}\right] \) to be,
\[ \begin{align} \mathbb{E}\left[ X_1\right] = \int_{-\infty}^\infty s_1 p(s_1)\mathrm{d}s_1 &= \int_{-\infty}^\infty \int_{-\infty}^\infty s_1 p(s_1 \vert s_2) p(s_2) \mathrm{d}s_2 \mathrm{d}s_1 \end{align} \] so that this can be understood as the center of mass for \( X_1 \) conditional on \( X_2 \) after averaging over the range of \( X_2 \), weighted by its density.
Recall that the expected value of a RV can be described as,
\[ \begin{align} \mathbb{E}\left[\pmb{X}\right] = \begin{pmatrix} \mathbb{E}\left[X_1\right] \\ \vdots \\ \mathbb{E}\left[X_{N_x}\right] \end{pmatrix} \end{align} \]
Using this definition, it is easy to show how we can apply our usual rules of distributing this expected value over linear combinations of RVs.
Particularly, simply following row-versus column multiplication, we can deduce directly that
\[ \mathbb{E}\left[\mathbf{A}\pmb{X}_1 + \mathbf{B}\pmb{X}_2 \right] = \mathbf{A}\mathbb{E}\left[ \pmb{X}_1\right] + \mathbf{B}\mathbb{E}\left[\pmb{X}_2\right]. \] for constant matrices \( \mathbf{A},\mathbf{B} \).
The above likewise holds for scalars \( a,b \) trivially.
Now, suppose for a sample size \( N_e \), with RVs \( \{\pmb{X}_j\}_{j=1}^{N_e} \) of length \( N_x \), we again define the ensemble-based mean of the random variable \( X_i \) corresponding to the \( i \)-th entry of each above random vector as
\[ \begin{align} \hat{X}_i = \frac{1}{N_e} \sum_{j=1}^{N_e} X_{ij}, \end{align} \] which is itself a random variable.
This is equivalent to taking a row-average of the random matrix, taken over all columns \( j=1,\cdots, {N_e} \)
\[ \begin{align} \mathbf{E} & = \begin{pmatrix} X_{1,1} & \cdots & X_{1,N_e} \\ \vdots & \ddots & \vdots \\ X_{N_x,1} & \cdots & X_{N_x,N_e} \end{pmatrix} \end{align} \]
Particularly, for the vector with all entries equal to one \( \pmb{1} \), we have,
\[ \begin{align} \hat{\pmb{X}} = \mathbf{E}\pmb{1}\frac{1}{N_e} = \begin{pmatrix} \hat{X}_1 \\ \vdots \\ \hat{X}_{N_x} \end{pmatrix} \end{align} \] for each \( i=1,\cdots, N_x \) as defined above.
Note that from the last slide, we had
\[ \begin{align} \hat{\pmb{X}} = \mathbf{E}\pmb{1}\frac{1}{N_e} \end{align} \]
Particularly, \( \hat{\pmb{X}} \) is an unbiased estimator for \( \overline{\pmb{x}} \), i.e, \( \mathbb{E}\left[ \hat{\pmb{X}}\right] = \overline{\pmb{x}} \).
For observations of the RVs \( \{\pmb{x}_j\}_{j=1}^{N_e} \), we similarly will denote the observed sample-based mean,
\[ \begin{align} \bar{x}_i = \frac{1}{N_e} \sum_{j=1}^{N_e} x_{ij} \end{align} \]
where this can also be seen as taking the row-average of
\[ \begin{align} \mathbf{E} & = \begin{pmatrix} x_{1,1} & \cdots & x_{1,N_e} \\ \vdots & \ddots & \vdots \\ x_{N_x,1} & \cdots & x_{N_x,N_e} \end{pmatrix} \end{align} \]
Using the previous relationships, we can similarly show that,
\[ \begin{align} \mathbf{E} \pmb{1} \frac{1}{N_e} = \begin{pmatrix} \hat{x}_1 \\ \vdots \\ \hat{x}_{N_e} \end{pmatrix} = \hat{\pmb{x}}. \end{align} \]
This tells us that the vector of the sample-based means of observations provides a point estimate for the expected value of the RV \( \mathbb{E}\left[\pmb{X}\right] = \overline{\pmb{x}} \)
In the next part of this unit, we will consider how to measure spread of multiple RVs and how this relates to the notion of correlation between variables.
Finally, we will discuss briefly the importance of the multivariate Gaussian distribution, defined over RVs.