4D-VAR and weak constraint 4D-VAR

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Incremental 4D-VAR
- Weak constraint 4D-VAR

Motivation

We saw in the last lecture how we can formally extend the linear-Gaussian model for data assimilation into a nonlinear system.
- Particularly, the two classic approaches that perform such an analysis are known as 3D-VAR (in weather prediction) and more generally the extended Kalman filter.
The primary difference in how these estimators perform is in the way in which they treat the background weights for a least-squares style optimization.
3D-VAR can be viewed as a recursive least-squares estimate where the model state is taken as a random draw from invariant dynamics of the long-time average.
- This is computationally cheap to perform, but this lacks the time-dependent structure of the forecast spread, encoded in the covariance.
The extended Kalman filter seeks to include this time-dependent information by making the first order approximation of the evolution of the background covariance in time.
While this approximation can be very successful,
- particularly when the dimensionality of the model state is not exceptionally large and when the forecast model is not exceptionally nonlinear,
typically the extended Kalman filter has not seen widespread use due to the numerical cost and stability issues of the estimator.

Motivation

Another classic approach to extend linear-Gaussian methods to nonlinear estimation follows the motivation of 3D-VAR.
Rather than fitting the model state to data, relative to the long-time background weights, without the time-dependence,
- we can alternatively introduce the time-dependence through the time series of observations.
This follows directly from formally extending the 4D-smoothing cost function of the linear-Guassian analysis with a locally linearized, quadratic cost function approximating nonlinear least-squares.
4D-VAR is one of the most important scalable data assimilation algorithms for its strong performance, and its forming the basis of many widely used operational data assimilation algorithms.
4D-VAR refers to extending the “three dimensional” state-space cost function to include the time variable, and performing a global analysis of a time series to optimize an initial condition.
- However, we consider a square root analysis again as this translates well to a variety of modern techniques such as hybrid ensemble-variational methods and the \( \alpha \) trick.

Incremental 4D-VAR

Recall that when we introduced the extended Kalman filter cost function, we derived this by the local linearization of the nonlinear cost function:

\[ \begin{align} \mathcal{J}_{\mathrm{EKF}}(\pmb{w}) &:=\frac{1}{2}\parallel \pmb{w}\parallel^2 + \frac{1}{2}\parallel \pmb{y} - \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) - \mathbf{H}_k \boldsymbol{\Sigma}_k^\mathrm{fore}\pmb{w} \parallel_{\mathbf{R}_k}^2,\\ \end{align} \]
which is actually quadratic in \( \pmb{w} \), as \( \mathcal{H}\left(\overline{\pmb{x}}_k^\mathrm{fore}\right) \) is a constant with respect to the optimization.
Therefore, this represents a fully linearized system, performing an approximate conditional Gaussian analysis in the space of perturbations.
If we take a constant \( \mathbf{B}_0 \) as with 3D-VAR, and define the matrix factor again

\[ \begin{align} \mathbf{B}_0 := \boldsymbol{\Sigma}_0\boldsymbol{\Sigma}_0^\top & & \pmb{x}_0 := \overline{\pmb{x}}_0 + \boldsymbol{\Sigma}_{0}\pmb{w} \end{align} \]
we can apply the same tangent-linear approximation as with the extended Kalman filter to optimize the initial state versus a time series of observations globally at-once.

Incremental 4D-VAR

Making the approximation of the tangent-linear model,

\[ \begin{align} &\frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} \approx \pmb{f}(\overline{\pmb{x}}) + \nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\\ \\ \Rightarrow&\int_{t_{k-1}}^{t_k}\frac{\mathrm{d}}{\mathrm{d}t}\pmb{x}\mathrm{d}t \approx \int_{t_{k-1}}^{t_k} \pmb{f}(\overline{\pmb{x}}) \mathrm{d}t + \int_{t_{k-1}}^{t_k}\nabla_{\pmb{x}}\pmb{f}(\overline{\pmb{x}})\pmb{\delta}\mathrm{d}t \\ \\ \Rightarrow & \pmb{x}_{k} \approx \mathcal{M}_{k}\left(\overline{\pmb{x}}_{k-1}\right) + \mathbf{M}_k\pmb{\delta}_{k-1} \end{align} \] where \( \mathbf{M}_k \) is the resolvent of the tangent-linear model.
Gaussians are closed under affine transformations, approximating the evolution under the tangent-linear model as

\[ \begin{align} \pmb{x}_{k} \sim N\left(\mathcal{M}_k\left(\overline{\pmb{x}}_{k-1}\right), \mathbf{M}_k \mathbf{B}_{k-1}\mathbf{M}^\top_{k}\right) \end{align} \]
Therefore, the 4D-quadratic cost function is approximated by an incremental linearization along the background mean

\[ \begin{alignat}{2} & & {\color{#d95f02} {\mathcal{J} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \overline{\pmb{x}}_0 - \overline{\pmb{x}}_0 - \boldsymbol{\Sigma}_0 \pmb{w} \parallel^2_{\mathbf{B}_0}} } + {\color{#7570b3} {\sum_{k=1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } }\right)}} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} {\color{#d95f02} {\boldsymbol{\Sigma}_{0} \pmb{w} } } \parallel_{\mathbf{R}_k}^2 } } \end{alignat} \] describing an approximate linear-Gaussian model / cost function in the space of perturbations.

Incremental 4D-VAR

The incremental 4D-VAR cost function from the last slide is composed as follows:

\[ \begin{alignat}{2} & & {\color{#d95f02} {\mathcal{J} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \overline{\pmb{x}}_0 - \overline{\pmb{x}}_0 - \boldsymbol{\Sigma}_0 \pmb{w} \parallel^2_{\mathbf{B}_0}} } + {\color{#7570b3} {\sum_{k=1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } }\right)}} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} {\color{#d95f02} {\boldsymbol{\Sigma}_{0} \pmb{w} } } \parallel_{\mathbf{R}_k}^2 } }\\ & & &= {\color{#d95f02} {\frac{1}{2} \parallel \pmb{w} \parallel^2} } + {\color{#7570b3} {\sum_{k=1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } }\right)}} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} {\color{#d95f02} {\boldsymbol{\Sigma}_{0} \pmb{w} } } \parallel_{\mathbf{R}_k}^2 } } \end{alignat} \] where

\( {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } } \right) } } \) represents the fully nonlinear evolution of the initial background mean;
\( {\color{#7570b3} {\mathcal{H}_k \circ {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } } \right) } } } } \) is the nonlinear evolution of the background mean, pushed into the observation variables;
\( {\color{#7570b3} { \mathcal{H}_k\circ {\color{#1b9e77} { \mathcal{M}_{k:1} \left( {\color{#d95f02} {\overline{\pmb{x}}_{0} } }\right)}} + \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} {\color{#d95f02} {\boldsymbol{\Sigma}_{0} \pmb{w} } } } } \) is the Taylor expansion of the perturbation of the mean, \( \overline{\pmb{x}}_0 + \boldsymbol{\Sigma}_0 \pmb{w} \), through the composition of the dynamical and observation models.
the cost function in total then represents the error-free (\( \pmb{w}_k\equiv \pmb{0} \)) model evolution of a perturbation to an initial proposal state and the total cost of the perturbation in its miss-match with the observations and the proposal.

This then extends the locally quadratic objective function that was derived earlier, but to include the derivative of the dynamical model with respect to the model state.
This is precisely the way that the adjoint is thus used to compute the gradient as discussed with variational least squares.
The incremental linearization along the background provides the approximation of the gradient with the adjoint.

Incremental 4D-VAR

In particular, the adjoint variables are defined by a backward-in-time solution to the linear equation

\[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}t} \tilde{\pmb{\delta}} = -\left(\nabla_{\pmb{x}} \pmb{f}(\overline{\pmb{x}})\right)^\top \tilde{\pmb{\delta}}, \end{align} \] with the underlying dependence on the nonlinear solution over the time interval.
Therefore, in incremental 4D-VAR, one constructs the gradient for the objective function, differentiating the nonlinear model, via:
- a forward pass of the nonlinear solution, while computing the tangent-linear evolution in the space of the perturbations; with
- a second backward pass only in the adjoint equations, back-propagating the sensitivities along this solution to find the gradient.
This a very effective and efficient solution, but relies on the construction of the tangent-linear and adjoint models for the dynamics.
- For large-scale nolinear models, this can be extremely challenging, though increasingly this can be performed using automatic differentiation techniques, by formally computing these from a computer program alone.
The above discussion is the basis of the traditional incremental 4D-VAR, though modern formulations of 4D-VAR seek to include the effect of model error in the estimation.
The standard approach to include the dependence of model error is known as weak-constraint 4D-VAR.

Weak-constraint 4D-VAR

The idea behind weak-constraint 4D-VAR is to use the form of the hidden Markov model with additive noise to allow for a non-exact model evolution.
Particularly, we consider that

\[ \begin{align} & \pmb{x}_k = \mathcal{M}_k(\pmb{x}_{k-1}) + \pmb{w}_k \\ \Leftrightarrow & \pmb{x}_k - \mathcal{M}_{k}(\pmb{x}_{k-1}) = \pmb{w}_k. \end{align} \]
If we again suppose a linear-Gaussian approximation is appropriate at first order, we can say that the transition density is given by

\[ \begin{align} p\left(\pmb{x}_k | \pmb{x}_{k-1}\right) = N\left(\pmb{x}_k - \mathcal{M}_k (\pmb{x}_{k-1}) | \pmb{0}, \mathbf{Q}_k \right), \end{align} \] where the above refers to the Gaussian density with mean zero and covariance \( \mathbf{Q}_k \).
The fully nonlinear weak-constraint 4D-VAR objective function is then given as,

\[ \begin{align} \mathcal{J}(\pmb{x}_{L:0}) := \frac{1}{2}\parallel \overline{\pmb{x}}_0 - \pmb{x}_0 \parallel^2_{\mathbf{B}_0} + \frac{1}{2} \sum_{k=1}^L \left\{ \parallel \pmb{x}_k - \mathcal{M}_k(\pmb{x}_{k-1}) \parallel^2_{\mathbf{Q}_k} + \parallel \pmb{y}_k - \mathcal{H}_k(\pmb{x}_k) \parallel^2_{\mathbf{R}_k} \right\}, \end{align} \] where we extend the former objective function by simultaneously minimizing the differences between the evolution of a past state and the next optimized state relative to the model uncertainty.
The weak-constraint cost function is likewise then typically approximated with a locally quadratic cost function via incremental linearization.