Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- 3D-VAR
- 3D-VAR versus the extended Kalman filter
- The extended Kalman filter

While

**pure Bayesian techniques**such as the**particle filter**and**Metropolis-Hastings**are theoretically capable of fully nonlinear / non-Gaussian estimation, i.e., in systems like\[ \begin{align} \pmb{x}_k &= \mathcal{M}_k (\pmb{x}_{k-1}) + \pmb{w}_k \\ \pmb{y}_k &= \mathcal{H}_k (\pmb{x}_k) + \pmb{v}_k \end{align} \] with

**arbitrary error distributions**,- in practice, for systems of even
**moderately large dimension**, these approaches can become**computationally intractable**. Particle filters and similar methods are currently an active research area, with a variety of techniques being developed to overcome the curse of dimensionality.

- Currently, however, this computational barrier has hindered their operational use in large-scale prediction problems.

For the rest of the course, we will consider a variety of ways that tools from the

**linear-Gaussian estimation problem**can be**adapted for prediction algorithms that can be used at-scale**.

**3D-VAR**and the**extended Kalman filter**are two closely related approaches for**extending the linear-Gaussian intuition**to large-scale estimation problems.While 3D-VAR is not widely used currently in large-scale operational prediction problems,

- and while the extended Kalman filter has only seen limited implementation for large-scale operational prediction,

these two approaches are

**foundational for understanding the challenges and the methods**devised for**large-scale nonlinear estimation**.Likewise,

**3D-VAR**forms an important component in understanding a variety of modern techniques such as**covariance hybridization**.For these reasons, we will provide a general, high-level discussion of these methods.

Let us recall the form of the

**penalized, state-space objective function**that we derived in**least-squares optimization**:\[ \begin{align} \mathcal{J}_p(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}\parallel_{\mathbf{B}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathbf{H}\pmb{x}\parallel_{\mathbf{R}_k}^2. \end{align} \]

where, like in the

**Gaussian maximum a posteriori estimation**, we estimate an optimal state as a**balance between**:- the
**deviation from the background estimate**, weighted inverse proportionally to the uncertainty weights \( \mathbf{B} \); and - the
**deviation from the observed data**, weighted inverse proportionally to the uncertainty of the data \( \mathbf{R}_k \).

- the
The method of

**3D-VAR**is to extend this objective function directly to nonlinear dynamics where we may**replace \( \mathbf{H} \) with a nonlinear map \( \mathcal{H} \)**.The objective function therefore becomes

\[ \begin{align} \mathcal{J}_p(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}\parallel_{\mathbf{B}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}(\pmb{x})\parallel_{\mathbf{R}_k}^2, \end{align} \] which is no

**longer guaranteed to be globally convex**, and therefore**must be locally minimized**.This can be performed with, e.g., a Gauss-Newton or other quasi-Newton procedures for the minimization;

- the optimized state estimate \( \hat{\pmb{x}} \) can thus be propagated to find a subsequent forecast.

However, in the nonlinear setting, there are a variety of new considerations for how this optimization compares to the “optimal” Bayesian solution.

Specifically, the

**selection and interpretation**of the**background covariance / weights matrix**\( \mathbf{B} \) changes considerably.- Likewise, the
**interpretation of the optimized solution**\( \hat{\pmb{x}} \)**depends on the background**.

- Likewise, the
In 3D-VAR, the typical choice of the background weights \( \mathbf{B} \) is given by taking the

**covariance matrix for the model state**over a**long time series of past observations / model simulations**.In weather and climate prediction, this is known as the “climatological covariance”, estimated over a long past record of the state variables.

This has the benefit that this gives a

**cost effective analysis**, as the background is taken as \( \mathbf{B}_k \equiv \mathbf{B} \)**static-in-time**.Similarly, the initial guess \( \overline{\pmb{x}} \) is given by the model simulation of the last optimized solution \( \hat{\pmb{x}} \),

\[ \begin{align} \overline{\pmb{x}} := \mathcal{M}_k \left(\hat{\pmb{x}}\right). \end{align} \]

This is understood then to

**optimize the last forecast solution versus the observed data**, where the**background is using “static persistence weights”**.That is, the static persistence weights do not include the information of the time-evolving covariance, only the long-time static information about the variables.

If we suppose that \( \mathbf{B} \) and \( \mathbf{R}_k \) are selected accurately, and the long-time distribution for \( \pmb{x} \) is Gaussian, this can be interpreted as the

**maximum-a-posteriori estimate from the invariant distribution of the dynamics**.

The primary benefit of the

**3D-VAR**approach is its**simplicity in the estimation problem**.- However, this is also the
**primary weakness**, as by not considering the time-varying covariance of the forecast, - this
**lacks information about the time-varying evolution of the state**.

- However, this is also the
The

**extended Kalman filter**seeks to rectify this by explicitly constructing a procedure to produce a time-varying background as with the Kalman filter.This is performed by using the

**tangent-linear model for Gaussian perturbations of the mean state**.

In particular, suppose that the equations of motion are generated by a nonlinear function, independent of time for simplicity,

\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} = \pmb{f}(\pmb{x}) & &\pmb{x}_k= \mathcal{M}_k(\pmb{x}_{k-1}):= \int_{t_{k-1}}^{t_k} \pmb{f}(\pmb{x}) \mathrm{d}t + \pmb{x}_{k-1} \end{align} \]

We can

**extend the linear-Gaussian approximation**for the forecast density as\[ \begin{align} &\pmb{x}_{k-1} := \overline{\pmb{x}}_{k-1} + \pmb{\delta}_{k-1} \sim N(\overline{\pmb{x}}_{k-1}, \mathbf{B}_{k-1})\\ \Leftrightarrow &\pmb{\delta}_{k-1} \sim N(\pmb{0}, \mathbf{B}_{k-1}) \end{align} \]

The evolution of the perturbation \( \pmb{\delta} \) can thus be

**approximated via Taylor's theorem**as\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{\delta}&:= \frac{\mathrm{d}}{\mathrm{d}t}\left( \pmb{x} - \overline{\pmb{x}}\right)\\ &=\pmb{f}(\pmb{x}) - \pmb{f}(\overline{\pmb{x}})\\ &=\nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta} + \mathcal{O}\left(\parallel \pmb{\delta}\parallel^2\right) \end{align} \] where \( \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}}) \) is the

**Jacobian equation**with**dependence on the underlying mean state**.In particular, the linear evolution defined by the truncated Taylor expansion

\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{\delta}:= \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta} \end{align} \] is known as the

**tangent-linear model**.

Making the approximation of the tangent-linear model,

\[ \begin{align} &\frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} \approx \pmb{f}(\overline{\pmb{x}}) + \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\\ \\ \Rightarrow&\int_{t_{k-1}}^{t_k}\frac{\mathrm{d}}{\mathrm{d}t}\pmb{x}\mathrm{d}t \approx \int_{t_{k-1}}^{t_k} \pmb{f}(\overline{\pmb{x}}) \mathrm{d}t + \int_{t_{k-1}}^{t_k}\nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\mathrm{d}t \\ \\ \Rightarrow & \pmb{x}_{k} \approx \mathcal{M}_{k}\left(\overline{\pmb{x}}_{k-1}\right) + \mathbf{M}_k\pmb{\delta}_{k-1} \end{align} \] where \( \mathbf{M}_k \) is the

**resolvent of the tangent-linear model**.**Gaussians are closed under affine transformations**, approximating the evolution under the tangent-linear model as\[ \begin{align} \pmb{x}_{k} \sim N\left(\mathcal{M}_k\left(\overline{\pmb{x}}_{k-1}\right), \mathbf{M}_k \mathbf{B}_{k-1}\mathbf{M}^\top_{k}\right) \end{align} \]

Therefore, this provides a means, in principle, to approximate the evolution of the background covariance in time.

The

**extended Kalman filter**can be understood then as turning the 3D-VAR objective function into a**time-varying objective function**as follows.Let the previous analysis solution be given by

**approximating the distribution of the model state**as\[ \begin{align} \pmb{x}_k &\sim N\left(\overline{\pmb{x}}^{\mathrm{filt}}_{k-1} , \mathbf{B}_{k-1}^{\mathrm{filt}}\right). \end{align} \]

We

**approximate the forecast distribution**as,\[ \begin{align} \pmb{x}_k &\sim N\left( \overline{\pmb{x}}_k^\mathrm{fore}, \mathbf{B}_k^\mathrm{fore} \right),\\ \pmb{x}_k^\mathrm{fore}&:= \mathcal{M}_k\left(\overline{\pmb{x}}_{k-1}^{\mathrm{filt}}\right),\\ \mathbf{B}_k^\mathrm{fore}&:=\mathbf{M}_k \mathbf{B}_{k-1}^{\mathrm{filt}}\mathbf{M}_k^\top. \end{align} \]

The resulting

**extended Kalman filter cost function**can thus be written as\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}_k^\mathrm{fore}\parallel_{\mathbf{B}_k^\mathrm{fore}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}(\pmb{x})\parallel_{\mathbf{R}_k}^2, \end{align} \] which can similarly be minimized with quasi-Newton methods to find a local minima.

The

**extended Kalman filter**thus gives the improvement over 3D-VAR in that it uses the approximate,**time-varying background covariance using the tangent-linear model**.However, this comes with the cost of constructing and computing the evolution of a large covariance matrix in the tangent-linear model.

For large-scale, nonlinear systems,

**explicitly constructing such a covariance can be prohibitively expensive**.For this reason, one usually needs to make an additional approximation using the square root form of the equation in the weight-space.

\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{w}) &:= \frac{1}{2} \parallel \overline{\pmb{x}}_k^\mathrm{fore} - \boldsymbol{\Sigma}_{k}^\mathrm{fore} \pmb{w} - \overline{\pmb{x}}_k^\mathrm{fore} \parallel_{\mathbf{B}^\mathrm{fore}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore} + \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w}\right) \parallel_{\mathbf{R}_k}^2\\ &\approx \frac{1}{2}\parallel \pmb{w}\parallel^2 + \frac{1}{2}\parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) + \mathbf{H}_k \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w} \parallel_{\mathbf{R}_k}^2, \end{align} \] where

- \( \left(\boldsymbol{\Sigma}_k^\mathrm{i}\right)\left(\boldsymbol{\Sigma}_k^\mathrm{i}\right)^\top \approx \mathbf{B}_k^{\mathrm{i}} \) is a
**reduced rank approximation of the covariance using**, e.g., a truncated SVD; - \( \mathbf{H}_k:= \nabla_{\pmb{x}} \mathcal{H}_k \) is the
**Jacobian of the nonlinear observation operator**; and - the second term above is given by taking the
**Taylor expansion of the perturbation to the mean state through the observation operator**.

- \( \left(\boldsymbol{\Sigma}_k^\mathrm{i}\right)\left(\boldsymbol{\Sigma}_k^\mathrm{i}\right)^\top \approx \mathbf{B}_k^{\mathrm{i}} \) is a

Notice that the approximate cost function

\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{w}) &:=\frac{1}{2}\parallel \pmb{w}\parallel^2 + \frac{1}{2}\parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) + \mathbf{H}_k \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w} \parallel_{\mathbf{R}_k}^2,\\ \end{align} \]

is**actually quadratic in \( \pmb{w} \)**, as \( \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) \) is a**constant with respect to the optimization**.Therefore, this represents a

**fully linearized system**, performing an**approximate Kalman filtering in the space of perturbations**.The usual right-transform analysis can be computed from the objective function above so that

\[ \begin{align} \overline{\pmb{x}}^\mathrm{filt}_k := \overline{\pmb{x}}_k^\mathrm{fore} + \boldsymbol{\Sigma}_k^\mathrm{fore} \overline{\pmb{w}} & & \mathbf{H}_{\mathcal{J}}&:= \nabla_{\pmb{w}}^2 \mathcal{J} & & \mathbf{T}:=\mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} & & \boldsymbol{\Sigma}_{k}^\mathrm{filt}:= \boldsymbol{\Sigma}_k^\mathrm{fore}\mathbf{T} & & \end{align} \] an entire recursion can be derived as above (and extended to include model error in the evolution of \( \boldsymbol{\Sigma}_k^\mathrm{filt} \)).

The

**accuracy of the approximation**in the extended Kalman filter depends strongly on:- the
**nonlinearity of the state evolution**\( \mathcal{M}_k \) and if this can be well-approximated at first order by the tangent-linear model; and - the
**nonlinearity of the observation operator**\( \mathcal{H} \) and if this can be well-approximated at first order by the Jacobian.

- the
For the former, this usually depends on

**how long the forecast length is in time**, where for short forecast horizons this is often adequate.- For longer forecast horizons, the evolution of Gaussian perturbations is no longer appropriate, as the forecast begins to converge to the invariant “climatological” statistics.

For this reason, one may intuitively consider

**interpolating between the long-time-average background covariance**, and**a time dependent covariance**in practice for the optimization weights.- This is part of the intuition behind
**covariance hybridization**mentioned at the beginning of the lecture, - The other consideration is that if \( \boldsymbol{\Sigma}_k^{\mathrm{fore}} \) is of
**reduced rank**, this implies that**correction in the optimized solution is restricted to a low-dimensional subspace**. - If the subspace is too small, this may not be able to correct the growth of errors in the estimate, and
**interpolating with a static, full-rank background**can**rectify this sparse correction**.

- This is part of the intuition behind