Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
While pure Bayesian techniques such as the particle filter and Metropolis-Hastings are theoretically capable of fully nonlinear / non-Gaussian estimation, i.e., in systems like
\[ \begin{align} \pmb{x}_k &= \mathcal{M}_k (\pmb{x}_{k-1}) + \pmb{w}_k \\ \pmb{y}_k &= \mathcal{H}_k (\pmb{x}_k) + \pmb{v}_k \end{align} \] with arbitrary error distributions,
Particle filters and similar methods are currently an active research area, with a variety of techniques being developed to overcome the curse of dimensionality.
For the rest of the course, we will consider a variety of ways that tools from the linear-Gaussian estimation problem can be adapted for prediction algorithms that can be used at-scale.
3D-VAR and the extended Kalman filter are two closely related approaches for extending the linear-Gaussian intuition to large-scale estimation problems.
While 3D-VAR is not widely used currently in large-scale operational prediction problems,
these two approaches are foundational for understanding the challenges and the methods devised for large-scale nonlinear estimation.
Likewise, 3D-VAR forms an important component in understanding a variety of modern techniques such as covariance hybridization.
For these reasons, we will provide a general, high-level discussion of these methods.
Let us recall the form of the penalized, state-space objective function that we derived in least-squares optimization:
\[ \begin{align} \mathcal{J}_p(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}\parallel_{\mathbf{B}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathbf{H}\pmb{x}\parallel_{\mathbf{R}_k}^2. \end{align} \]
where, like in the Gaussian maximum a posteriori estimation, we estimate an optimal state as a balance between:
The method of 3D-VAR is to extend this objective function directly to nonlinear dynamics where we may replace \( \mathbf{H} \) with a nonlinear map \( \mathcal{H} \).
The objective function therefore becomes
\[ \begin{align} \mathcal{J}_p(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}\parallel_{\mathbf{B}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}(\pmb{x})\parallel_{\mathbf{R}_k}^2, \end{align} \] which is no longer guaranteed to be globally convex, and therefore must be locally minimized.
This can be performed with, e.g., a Gauss-Newton or other quasi-Newton procedures for the minimization;
However, in the nonlinear setting, there are a variety of new considerations for how this optimization compares to the “optimal” Bayesian solution.
Specifically, the selection and interpretation of the background covariance / weights matrix \( \mathbf{B} \) changes considerably.
In 3D-VAR, the typical choice of the background weights \( \mathbf{B} \) is given by taking the covariance matrix for the model state over a long time series of past observations / model simulations.
In weather and climate prediction, this is known as the “climatological covariance”, estimated over a long past record of the state variables.
This has the benefit that this gives a cost effective analysis, as the background is taken as \( \mathbf{B}_k \equiv \mathbf{B} \) static-in-time.
Similarly, the initial guess \( \overline{\pmb{x}} \) is given by the model simulation of the last optimized solution \( \hat{\pmb{x}} \),
\[ \begin{align} \overline{\pmb{x}} := \mathcal{M}_k \left(\hat{\pmb{x}}\right). \end{align} \]
This is understood then to optimize the last forecast solution versus the observed data, where the background is using “static persistence weights”.
That is, the static persistence weights do not include the information of the time-evolving covariance, only the long-time static information about the variables.
If we suppose that \( \mathbf{B} \) and \( \mathbf{R}_k \) are selected accurately, and the long-time distribution for \( \pmb{x} \) is Gaussian, this can be interpreted as the maximum-a-posteriori estimate from the invariant distribution of the dynamics.
The primary benefit of the 3D-VAR approach is its simplicity in the estimation problem.
The extended Kalman filter seeks to rectify this by explicitly constructing a procedure to produce a time-varying background as with the Kalman filter.
This is performed by using the tangent-linear model for Gaussian perturbations of the mean state.
In particular, suppose that the equations of motion are generated by a nonlinear function, independent of time for simplicity,
\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} = \pmb{f}(\pmb{x}) & &\pmb{x}_k= \mathcal{M}_k(\pmb{x}_{k-1}):= \int_{t_{k-1}}^{t_k} \pmb{f}(\pmb{x}) \mathrm{d}t + \pmb{x}_{k-1} \end{align} \]
We can extend the linear-Gaussian approximation for the forecast density as
\[ \begin{align} &\pmb{x}_{k-1} := \overline{\pmb{x}}_{k-1} + \pmb{\delta}_{k-1} \sim N(\overline{\pmb{x}}_{k-1}, \mathbf{B}_{k-1})\\ \Leftrightarrow &\pmb{\delta}_{k-1} \sim N(\pmb{0}, \mathbf{B}_{k-1}) \end{align} \]
The evolution of the perturbation \( \pmb{\delta} \) can thus be approximated via Taylor's theorem as
\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{\delta}&:= \frac{\mathrm{d}}{\mathrm{d}t}\left( \pmb{x} - \overline{\pmb{x}}\right)\\ &=\pmb{f}(\pmb{x}) - \pmb{f}(\overline{\pmb{x}})\\ &=\nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta} + \mathcal{O}\left(\parallel \pmb{\delta}\parallel^2\right) \end{align} \] where \( \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}}) \) is the Jacobian equation with dependence on the underlying mean state.
In particular, the linear evolution defined by the truncated Taylor expansion
\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \pmb{\delta}:= \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta} \end{align} \] is known as the tangent-linear model.
Making the approximation of the tangent-linear model,
\[ \begin{align} &\frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} \approx \pmb{f}(\overline{\pmb{x}}) + \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\\ \\ \Rightarrow&\int_{t_{k-1}}^{t_k}\frac{\mathrm{d}}{\mathrm{d}t}\pmb{x}\mathrm{d}t \approx \int_{t_{k-1}}^{t_k} \pmb{f}(\overline{\pmb{x}}) \mathrm{d}t + \int_{t_{k-1}}^{t_k}\nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\mathrm{d}t \\ \\ \Rightarrow & \pmb{x}_{k} \approx \mathcal{M}_{k}\left(\overline{\pmb{x}}_{k-1}\right) + \mathbf{M}_k\pmb{\delta}_{k-1} \end{align} \] where \( \mathbf{M}_k \) is the resolvent of the tangent-linear model.
Gaussians are closed under affine transformations, approximating the evolution under the tangent-linear model as
\[ \begin{align} \pmb{x}_{k} \sim N\left(\mathcal{M}_k\left(\overline{\pmb{x}}_{k-1}\right), \mathbf{M}_k \mathbf{B}_{k-1}\mathbf{M}^\top_{k}\right) \end{align} \]
Therefore, this provides a means, in principle, to approximate the evolution of the background covariance in time.
The extended Kalman filter can be understood then as turning the 3D-VAR objective function into a time-varying objective function as follows.
Let the previous analysis solution be given by approximating the distribution of the model state as
\[ \begin{align} \pmb{x}_k &\sim N\left(\overline{\pmb{x}}^{\mathrm{filt}}_{k-1} , \mathbf{B}_{k-1}^{\mathrm{filt}}\right). \end{align} \]
We approximate the forecast distribution as,
\[ \begin{align} \pmb{x}_k &\sim N\left( \overline{\pmb{x}}_k^\mathrm{fore}, \mathbf{B}_k^\mathrm{fore} \right),\\ \pmb{x}_k^\mathrm{fore}&:= \mathcal{M}_k\left(\overline{\pmb{x}}_{k-1}^{\mathrm{filt}}\right),\\ \mathbf{B}_k^\mathrm{fore}&:=\mathbf{M}_k \mathbf{B}_{k-1}^{\mathrm{filt}}\mathbf{M}_k^\top. \end{align} \]
The resulting extended Kalman filter cost function can thus be written as
\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{x}) := \frac{1}{2} \parallel \pmb{x} - \overline{\pmb{x}}_k^\mathrm{fore}\parallel_{\mathbf{B}_k^\mathrm{fore}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}(\pmb{x})\parallel_{\mathbf{R}_k}^2, \end{align} \] which can similarly be minimized with quasi-Newton methods to find a local minima.
The extended Kalman filter thus gives the improvement over 3D-VAR in that it uses the approximate, time-varying background covariance using the tangent-linear model.
However, this comes with the cost of constructing and computing the evolution of a large covariance matrix in the tangent-linear model.
For large-scale, nonlinear systems, explicitly constructing such a covariance can be prohibitively expensive.
For this reason, one usually needs to make an additional approximation using the square root form of the equation in the weight-space.
\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{w}) &:= \frac{1}{2} \parallel \overline{\pmb{x}}_k^\mathrm{fore} - \boldsymbol{\Sigma}_{k}^\mathrm{fore} \pmb{w} - \overline{\pmb{x}}_k^\mathrm{fore} \parallel_{\mathbf{B}^\mathrm{fore}}^2 + \frac{1}{2} \parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore} + \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w}\right) \parallel_{\mathbf{R}_k}^2\\ &\approx \frac{1}{2}\parallel \pmb{w}\parallel^2 + \frac{1}{2}\parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) + \mathbf{H}_k \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w} \parallel_{\mathbf{R}_k}^2, \end{align} \] where
Notice that the approximate cost function
\[ \begin{align}
\mathcal{J}_{\mathrm{EKF}}(\pmb{w}) &:=\frac{1}{2}\parallel \pmb{w}\parallel^2 + \frac{1}{2}\parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) + \mathbf{H}_k \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w} \parallel_{\mathbf{R}_k}^2,\\
\end{align} \]
is actually quadratic in \( \pmb{w} \), as \( \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) \) is a constant with respect to the optimization.
Therefore, this represents a fully linearized system, performing an approximate Kalman filtering in the space of perturbations.
The usual right-transform analysis can be computed from the objective function above so that
\[ \begin{align} \overline{\pmb{x}}^\mathrm{filt}_k := \overline{\pmb{x}}_k^\mathrm{fore} + \boldsymbol{\Sigma}_k^\mathrm{fore} \overline{\pmb{w}} & & \mathbf{H}_{\mathcal{J}}&:= \nabla_{\pmb{w}}^2 \mathcal{J} & & \mathbf{T}:=\mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} & & \boldsymbol{\Sigma}_{k}^\mathrm{filt}:= \boldsymbol{\Sigma}_k^\mathrm{fore}\mathbf{T} & & \end{align} \] an entire recursion can be derived as above (and extended to include model error in the evolution of \( \boldsymbol{\Sigma}_k^\mathrm{filt} \)).
The accuracy of the approximation in the extended Kalman filter depends strongly on:
For the former, this usually depends on how long the forecast length is in time, where for short forecast horizons this is often adequate.
For this reason, one may intuitively consider interpolating between the long-time-average background covariance, and a time dependent covariance in practice for the optimization weights.