The ensemble Kalman filter and smoother part I

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- The ensemble Kalman filter
- The classic ensemble Kalman smoother

Motivation

We have seen now how the linear-Gaussian analysis can be extended at first order to nonlinear systems in several classical ways.
Specifically, 3D-VAR and the extended Kalman filter both provide a means to produce an approximate filtering analysis in the space of perturbations.
4D-VAR extends the analysis of 3D-VAR, using a static “climatological” background covariance, to a smoothing formulation over an entire time series.
The primary issue with th 3D- / 4D-VAR approach is that the static background covariance doesn't capture the spread of the forecast as this changes in time.
- Rather, these approaches can be considered to predict from a persistence model, fitting the outcome to an observation or a time series of observations of the current process.
On the other hand, the extended Kalman filter provides a means to update the background covariance, but the propagation of the covariance in the tangent-linear model is often unstable / unfeasible.
An alternative formulation arises if we consider the sample-based estimates as with the particle filter and Metropolis-Hastings.

Motivation

In particular, let's recall our construction of the ensemble matrix \( \mathbf{E}\in\mathbb{R}^{N_x \times N_e} \):
- We will suppose that we have a random sample \( \pmb{X}_j \) following a parent distribution \( \pmb{X}\sim P \);
- The ensemble matrix is given such that \( \mathbf{E}^j = \pmb{X}_j \) for all \( j = 1,\cdots,N_e \).
Moreover, the sample mean can be computed from the row-average of the ensemble matrix as

\[ \hat{\pmb{X}} = \mathbf{E} \pmb{1} \frac{1}{N_e}. \]
We can thus define the sample covariance matrix in a way analogously to how we define the sample mean.
Particularly, if we follow the matrix multiplication with the transpose, we find that

\[ \begin{align} \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} \hat{X}_1 & \cdots & \hat{X}_{1} \\ \vdots & \ddots & \vdots \\ \hat{X}_{N_x} & \cdots &\hat{X}_{N_x} \end{pmatrix}\in\mathbb{R}^{N_x \times N_e} \end{align} \]
Particularly, this can be written column-wise as

\[ \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix}\hat{\pmb{X}}, \cdots, \hat{\pmb{X}}\end{pmatrix} \]

Motivation

Using element-wise subtraction with the last identity, this says that,

\[ \begin{align} \mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} = \begin{pmatrix} X_{1,1} - \hat{X}_1 & \cdots &X_{1,n}- \hat{X}_1 \\ \vdots & \ddots & \vdots \\ X_{N_x,1} - \hat{X}_{N_X} & \cdots & X_{N_X,N_e} - \hat{X}_{N_x} \end{pmatrix} \end{align} \]
With a re-normalization, we will define the matrix of perturbations or anomalies of the ensemble about the mean.
We define the (normalized) anomaly matrix of the ensemble as

\[ \begin{align} \mathbf{X} :&= \left(\mathbf{E} - \mathbf{E}\pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}}\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \end{align} \]

Motivation

The anomalies have the property

\[ \begin{align} \mathbf{P} :&= \mathbf{X} \mathbf{X}^\top \\ &= \mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{N_e -1}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\\ &=\mathbf{E}\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\mathbf{E}^\top\frac{1}{N_e -1} \end{align} \]

where \[ \begin{align} \mathbf{P}_{i,j} = \begin{cases} \hat{\sigma}^2_{i} &\text{ for }i=j\\ \hat{\sigma}_{i,j} &\text{ for }i\neq j \end{cases} \end{align} \]
Rather than dealing with the numerical challenges of propagating the background covariance \( \mathbf{B}_k^\mathrm{filt} \) through the tangent-linear model to form the next \( \mathbf{B}^\mathrm{fore}_k \) as with the extended Kalman filter,
- we can take the sampling approach of the particle filter and treat replicates as point volumes but with equal weights.
Forming such a particle cloud / ensemble this gives an estimator for the background \( \mathbf{P}_k \approx \mathbf{B}_k \) when the first order linear-Gaussian approximation is appropriate.
However, the linear-Gaussian assumption actually leads to a biased estimator, but which (by construction) eliminates the extremely high-variance of the particle filter weights.
This approach is the basis of the ensemble Kalman filter (EnKF).

The EnKF

One can see the EnKF to be a hybridization of the extended Kalman filter, variational cost function with a particle filter, using a sample-based covariance and sample-based mean estimate.
The resulting EnKF cost function can thus be written as

\[ \begin{align} \mathcal{J}_{\mathrm{EnKF}}(\pmb{x}) := \frac{1}{2} \parallel \hat{\pmb{x}}_k^\mathrm{fore} - \pmb{x} \parallel_{\mathbf{P}_k^\mathrm{fore}}^2 + \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}(\pmb{x})\parallel_{\mathbf{R}_k}^2, \end{align} \] where we take the ensemble-based, empirical mean and covariance as
- \( \hat{\pmb{x}}_k^\mathrm{fore} := \mathbf{E}_k^\mathrm{fore}\pmb{1} / N_e \),
- \( \mathbf{P}_k^\mathrm{fore} := \left(\mathbf{X}_k^\mathrm{fore}\right)\left(\mathbf{X}_k^\mathrm{fore}\right)^\top \), and
- \( \mathbf{X}^\mathrm{fore}_k :=\mathbf{E}^\mathrm{fore}_k\left( \mathbf{I} - \pmb{1}\pmb{1}^\top \frac{1}{N_e} \right)\frac{1}{\sqrt{N_e -1}} \).
The columns of the ensemble matrix are given by propagating the sample through the nonlinear model, so that if \( \pmb{x}_k^{i,\mathrm{filt}} \) is a replicate of the model state from the filtering density,

\[ \begin{align} \pmb{x}_k^{i,\mathrm{fore}} := \mathcal{M}_k\left(\pmb{x}_k^{i,\mathrm{filt}}\right) + \pmb{w}^i_k \end{align} \]
Therefore, sampling the forecast density is performed with the fully nonlinear state space model like the particle filter;
- the key of the method is in how one efficiently can resample the (approximate) filtering density given the maximum-a-posteriori analysis.

The EnKF

If we re-write the state vector as a linear combination of the replicates, we can devise this in the anomalies as

\[ \begin{align} \pmb{x} := \hat{\pmb{x}}_k + \mathbf{X}_k^\mathrm{fore} \pmb{w}. \end{align} \]
Notice that \( \pmb{w}\in \mathbb{R}^{N_e} \) so that this is an optimization over the ensemble dimension.
- If \( N_e \leq N_x \), then we should note that \( \parallel \cdot \parallel_{\mathbf{P}_k} \) refers to a pseudo-norm with respect to the pseudo-inverse of the anomaly matrix.
Revising the cost function, we can linearize the observation operator with Taylor's theorem as

\[ \begin{align} \mathcal{J}_{\mathrm{EnKF}}(\pmb{w}) := \frac{1}{2} \parallel \pmb{w} \parallel^2 + \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}\left(\hat{\pmb{x}}_k^\mathrm{fore}\right) - \mathbf{H}_k\mathbf{X}_k^\mathrm{fore} \pmb{w} \parallel_{\mathbf{R}_k}^2, \end{align} \] where we define the analysis with the linear approximation through the Hessian

\[ \begin{align} \mathbf{T}:= \mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} & & \mathbf{X}_k^\mathrm{filt} := \mathbf{X}_k^\mathrm{fore} \mathbf{T}. \end{align} \]
With the update to the anomalies defined as above, and the update to the mean defined for the optimal weights \( \overline{\pmb{w}} \) as

\[ \begin{align} \hat{\pmb{x}}^\mathrm{filt}_k := \hat{\pmb{x}}^\mathrm{fore}_k + \mathbf{X}_k^\mathrm{fore} \overline{\pmb{w}}, \end{align} \]
we can resample the entire ensemble from the approximate, best-fit Gaussian as

\[ \begin{align} \mathbf{E}_k^\mathrm{filt} := \hat{\pmb{x}}^\mathrm{filt}_k \pmb{1}^\top + \mathbf{X}_k^\mathrm{filt}\sqrt{N_e -1 }. \end{align} \]

The EnKF

In this formalism, we can appropriately define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),

\[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where in the above we would say that \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]
We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{smth}_{k|k} \);
- under the linear-Gaussian model, we furthermore have that
\[ \begin{align} \mathbf{E}^\mathrm{smth}_{k|L} = \mathbf{E}^\mathrm{smth}_{k|L-1}\boldsymbol{\Psi}_{L} & & \mathbf{E}^\mathrm{smth}_{k|K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}). \end{align} \]
Then we can perform a retrospective smoothing analysis on all past states stored in memory by using the latest right-transform update from the filtering step.
This form of retrospective analysis is the basis of the ensemble Kalman smoother (EnKS).

The EnKS

The EnKS takes advantage of the simple form of the retrospective, right-transform analysis by including an additional, inner loop of the filtering cycle.

Diagram of the filter observation-analysis-forecast cycle.

In the above, time is the horizontal axis where right moves forward in time.
At each time, we produce the standard filtering estimate by computing \( \boldsymbol{\Psi}_L \) from the cost function, and updating the forecast \[ \mathbf{E}^\mathrm{filt}_L = \mathbf{E}_L^\mathrm{fore} \boldsymbol{\Psi}_L. \]
The information of incoming observations is passed backward in time using the right-transform to condition the ensemble at past times: \[ \begin{align} \mathbf{E}^\mathrm{smth}_{k|L} = \mathbf{E}^\mathrm{smth}_{k|L-1} \boldsymbol{\Psi}_L. \end{align} \]