A fast, single-iteration ensemble Kalman smoother for online, sequential data assimilation

Authors:

Colin Grudzien1, Marc Bocquet2

Special Thanks:

Eric Olson1 & Mihye Ahn1 for HPC logistics and support

  1. University of Nevada, Reno, Department of Mathematics and Statistics
  2. CEREA, A joint laboratory École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France.
University of Nevada, Reno logo.
CEREA logo.

Outline

  • Motivation
  • A review of the ensemble transform Kalman filter (ETKF) in perfect, hidden Markov models
  • Fixed-lag smoothing
  • The classic ensemble transform Kalman smoother (ETKS)
  • The 4D smoothing cost function
    • Iterative analysis and the iterative ensemble Kalman smoother (IEnKS)
  • An alternative formulation of iterative smoothing for weakly nonlinear forecast error dynamics:
    • The single-iteration ensemble transform Kalman smoother (SIETKS)
  • Numerical simulations
    • Optimizing hyper-parameters: adaptive inflation
    • Nonlinear observation operators with SDA / MDA
    • Varying lag / shift sizes with SDA / MDA

Motivation

  • Iterative, ensemble-variational methods form the basis for the state-of-the-art for nonlinear, scalable data assimilation (DA)1

    • Methods following an ensemble Kalman filter (EnKF) style analysis include the ensemble randomized maximum likelihood method (EnRML)2 and the iterative ensemble Kalman smoother (IEnKS)3
  • These iterative, ensemble-variational methods combine the benefits of:
    1. the high-accuracy of the iterative solution of the Bayesian maximum a posteriori (MAP) solution to the DA problem;
    2. the relative simplicity of numerical model development and maintenance in ensemble-based DA;
    3. the ensemble-based analysis of time-dependent errors and possibly discontinuous, physical model parameters; and
    4. the variational analysis, optimizing hyper-parameters for, e.g., inflation4, localization5 and surrogate models6 to augment the estimation scheme.
1. Bannister, R. (2017). A review of operational methods of variational and ensemble‐variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 143(703), 607-633.
2. Chen, Y., & Oliver, D. (2012). Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Mathematical Geosciences, 44(1), 1-26.
3. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
4. Bocquet, M., Raanes, P., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.
5. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.
6. Bocquet, M., Brajard, J., Carrassi, A., & Bertino, L. (2020). Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization.Foundations of Data Science, 2(1): 55-80 (2020).

Motivation

  • However, a barrier to the use of iterative, ensemble-variational schemes in online, sequential DA still lies in the computational bottleneck of ensemble simulation in the nonlinear, physics-based model.

  • Many iterative smoothing procedures require multiple runs of the ensemble simulations to produce the forecast, filtering and re-analyzed smoothing statistics.

  • For online applications in which there is a short operational time-window to produce a forecast for future states, the iterative ensemble simulations may be prohibitively expensive.

  • Particularly in the case where nonlinearity in the DA cycle is not dominated by the forecast error dynamics,

    • i.e., if the tangent-linear evolution of Gaussian error distributions is an adequate approximation,
  • the cost of an iterative, ensemble simulation may not be justified by the improvement in forecast accuracy.

  • For short-range forecasting, nonlinearity in the DA cycle may instead by dominated by:

    • nonlinearity in the observation operator,
    • nonlinearity in hyper-parameter optimization, and / or
    • nonlinearity in temporally interpolating a re-analyzed solution over the lagged states.
  • For sequential forecast cycles in which the linear-Gaussian approximation of the forecast error may be appropriate, like synoptic meteorology,

    • a different form of iterative, ensemble-variational smoother may have substantial advantages in a computational cost / forecast accuracy tradeoff.

Motivation

  • We consider a simple hybridization of the classic ensemble transform Kalman smoother (ETKS) with the iterative ensemble Kalman smoother (IEnKS) to produce a single-iteration, fixed-lag, sequential smoother.

  • We combine:

    1. the filtering solution and retrospective analysis as in the classic EnKS; and
    2. a single iteration of the ensemble simulation over the lagged states with the re-analyzed prior;
  • to improve the EnKF filter and forecast statistics in the subsequent DA cycle.

  • The resulting scheme is a “single-iteration”, ensemble Kalman smoother that sequentially solves the filtering, Bayesian MAP cost-function;

    • the scheme produces the forecast, filtering and re-analyzed smoothing statistics within a single iteration of the ensemble simulation of the lagged states.
  • This scheme seeks to minimize the leading order cost of ensemble-variational smoothing, i.e., the ensemble simulation in the nonlinear forecast model.

    • However, we are free to iteratively solve the filtering cost function for any single observation without iterations of the ensemble simulation.
  • This is an outer-loop optimization of the cost of fixed-lag, sequential smoothing, designed for online applications with weakly nonlinear forecast error evolution.

  • We will denote the general framework as a “single-iteration” smoother, while the specific implementation presented here is denoted the single-iteration ensemble transform Kalman smoother (SIETKS).

Motivation

  • For linear-Gaussian systems with the perfect model hypothesis, the SIETKS is a fully consistent Bayesian estimator, albeit one that uses redundant model simulations.

  • When the forecast error dynamics are weakly nonlinear, yet other aspects of the DA cycle are strongly nonlinear,

    • e.g., the filtering cost function is nonlinear due to hyper-parameters or a nonlinear observation operator,
  • we demonstrate the SIETKS has predictive performance comparable with fully iterative methods.

    • However, the SIETKS has a numerical cost that scales with matrix inversions in the ensemble dimension, rather than the cost of the ensemble simulation.
  • We furthermore derive a method of multiple data assimilation (MDA) for this hybrid smoother scheme;

    • the MDA scheme handles the increasing nonlinearity of the temporal interpolation of the posterior over long lag windows or large shifts of the smoothing solution.
  • The result is a two-stage MDA algorithm, estimating the forecast, filtering and re-analyzed smoothing, and shifting the smoother forward in time with two sweeps of the lagged states with an ensemble simulation.

  • Stage:

    1. samples the posterior density, balancing the MDA weights to produce the ensemble estimates;
    2. shifts the window of lagged states, updating the MDA weights for the shifted window.

Perfect hidden Markov models

  • We use the perfect model assumption; generalizing the techniques for the presence of model errors is ongoing work.
  • Define a perfect physical process model and observation model, \[ \begin{align} \pmb{x}_k &= \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right) & & \pmb{x}_k\in\mathbb{R}^{N_x}\\ \pmb{y}_k &= \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k & & \pmb{y}_k\in\mathbb{R}^{N_y} \end{align} \]
  • Let us denote the sequence of the process model states and observation model states as, \[ \begin{align} \pmb{x}_{m:k} &:= \left\{\pmb{x}_m, \pmb{x}_{m+1}, \cdots, \pmb{x}_k\right\}, \\ \pmb{y}_{m:k} &:= \left\{\pmb{y}_m, \pmb{y}_{m+1}, \cdots, \pmb{x}_k\right\}. \end{align} \]
  • We assume that the sequence of observation error vectors \[ \begin{align} \{\boldsymbol{\epsilon}_k : k=1,\cdots, L\} \end{align} \] is independent in time.
  • In the above configuration, we will say that the transition “density” is given in terms of \[ \begin{align} p(\pmb{x}_k \vert \pmb{x}_{k-1} ) \propto \pmb{\delta} \left\{\pmb{x}_k - \mathcal{M}_k\left(\pmb{x}_{k-1}\right)\right\}\label{eq:dirac_distribution} \end{align} \] where \( \pmb{\delta} \) represents the Dirac distribution.
  • The filtering problem is to sequentially and recursively estimate, \[ \begin{align} p(\pmb{x}_L \vert \pmb{y}_{1:L}) \end{align} \] or some statistic of this density, given:
    • an initial prior \( p(\pmb{x}_0) \); and
    • the observation likelihoods \( p(\pmb{y}_k \vert \pmb{x}_k ) \).
  • This can be described as an observation-analysis-forecast cycle.

The observation-analysis-forecast cycle

Diagram of the filter observation-analysis-forecast cycle.

  • Using Bayes' law, this can be written as \[ \begin{align} {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)} } &\propto {\color{#7570b3} { p\left(\pmb{y}_L \vert \pmb{x}_L\right) } } {\color{#1b9e77} { p\left(\pmb{x}_L\vert \pmb{y}_{1:L-1}\right) } } \end{align} \]
  • When the models are linear, i.e., \[ \begin{align} \pmb{x}_k &= \mathbf{M}_k \pmb{x}_{k-1};\\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_k + \pmb{\epsilon}_k; \end{align} \]
  • and the error densities are Gaussian, i.e., \[ \begin{align} p(\pmb{x}_0) &= N\left(\overline{\pmb{x}}_0, \mathbf{B}_0\right);\\ p\left(\pmb{y}_k \vert \pmb{x}_k \right) &= N\left(\mathbf{H}_k\pmb{x}_k, \mathbf{R}_k\right); \end{align} \]
  • the posterior filtering density is Gaussian at all times, \[ {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)\equiv N\left(\overline{\pmb{x}}^\mathrm{filt}_k, \mathbf{B}_k^\mathrm{filt}\right)}} \] and can be computed analytically in a recursive formulation by the Kalman filter.
  • The ensemble Kalman filter (EnKF) uses an empirical, ensemble-based forecast mean and covariance, along with the parametric update of the Kalman filter, to approximate the Bayesian posterior.
  • Many different techniques have been developed to efficiently re-sample an ensemble from the posterior, using the above approximation.

The ETKF

  • We will consider the right ensemble transform version.

  • Specifically, with Gaussian error densities,

    \[ \begin{align} p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) = N\left(\overline{\pmb{x}}_L^\mathrm{fore}, \mathbf{B}_L^\mathrm{fore}\right) & & p\left(\pmb{y}_L \vert \pmb{x}_L \right) = N\left(\mathbf{H}_L \pmb{x}^\mathrm{fore}_L, \mathbf{R}_L\right) \end{align} \] this can be written as a least-squares optimization as follows.

  • Suppose that a forecast ensemble \( \mathbf{E}_{L}^\mathrm{fore}\in \mathbb{R}^{N_x \times N_e} \) is drawn iid from \( p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) \).

  • Define the ensemble-based, empirical estimates,

    \[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &:= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &:= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}; & \pmb{x}&:= \hat{\pmb{x}}_L^\mathrm{fore} + \mathbf{X}^\mathrm{fore}_L \pmb{w} . \end{align} \]

  • Maximizing the empirical posterior density is equivalent to minimizing the cost function \( \mathcal{J}(\pmb{x}) \equiv \widetilde{\mathcal{J}}(\pmb{w}) \),

    \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \ll N_x \).

The ETKF

  • Setting the gradient \( \nabla_{\pmb{w}} \widetilde{\mathcal{J}} \) equal to zero for \( \overline{\pmb{w}} \) we find \[ \begin{align} \overline{\pmb{w}} = \pmb{0} - {\widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}}^{-1} \nabla_{\pmb{w}} \widetilde{\mathcal{J}}. \end{align} \] where \( \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}:= \nabla^2_{\pmb{w}} \widetilde{\mathcal{J}} \) is the Hessian of the cost function.
  • This corresponds to a single iteration of Newton’s descent algorithm, yielding \[ \overline{\pmb{x}}_L^\mathrm{filt} := \overline{\pmb{x}}_{L}^\mathrm{fore} + \mathbf{X}_L^\mathrm{fore} \overline{\pmb{w}}. \]
  • If we define a right-transform matrix, \[ \begin{align} \mathbf{T}:= \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}}, \end{align} \]
  • we similarly have the update for the covariance as \[ \begin{align} \mathbf{P}^\mathrm{filt}_L = \left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)\left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)^\top /(N_e - 1). \end{align} \]
  • The numerical cost of the Newton step and the transform \( \mathbf{T} \) are subordinate to the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
  • This sketches the right ensemble transform Kalman filter recursion as in the derivation of the local ensemble transform Kalman filter (LETKF).7
7. Hunt, B.R., Kostelich, E.J., & Szunyogh, I. (2007). Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena, 230(1-2), 112-126.

Smoothing

Diagram of the filter observation-analysis-forecast cycle.

  • This recursively estimates the density of the current state, but a related question regards the past states.
  • E.g., the observation at time \( t_3 \) can be used to update the conditional estimate for the model state at times \( t_0 \), \( t_1 \) and \( t_2 \).
  • This produces a smoothing posterior estimate for the past states.
  • The smoothing estimate conditions all past states within the data assimilation window (DAW), i.e., over the time series of all lagged states.
  • A smoothing estimate may be produced in a variety of ways, exploiting different formulations of the Bayesian inference problem as a joint or marginal smoother.8
  • The implementation also depends on whether the DAW is fixed in time, or if this window is shifted in time sequentially.
    • For online, observation-analysis-forecast cycles, we will sequentially shift the DAW.
8. Cosme, E., Verron, J., Brasseur, P., Blum, J., & Auroux, D. (2012). Smoothing problems in a Bayesian framework and their linear Gaussian solutions. Monthly Weather Review, 140(2), 683-695.

Sequential Smoothing

Diagram of the filter observation-analysis-forecast cycle.

  • In the above, we see a schematic of how the DAW is sequentially shifted in time for a general fixed-lag smoother.
  • Generally, we will consider a DAW of a lag length \( L \times \Delta t \), with a sequential shift of length \( S\times \Delta t \).
  • After the conditional estimate is formed, we will shift the DAW by \( S \times \Delta t \);
    • this will discard \( S \times \Delta t \) past states from the DAW, and \( S \times \Delta t \) new states and observations will enter the DAW.
  • In the figure above, the \( S \times \Delta t \) new states and observations that enter the DAW are filled in black.
  • The above fixed-lag formalism allows us to treat the smoothing problem like sequential filtering.
  • This can come with large benefits to the precision of estimation by utilizing a full time series to form conditional estimates;
    • however, it can also become extremely costly to produce conditional estimates this way when \( S \) is small and \( L \) is large.

The ETKS

  • In the ETKF formalism, we can define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),

    \[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where we say that the columns of the matrices are distributed iid as \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]

  • We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{post}_{k|k} \), where we define

    \[ \begin{align} \mathbf{E}^\mathrm{post}_{k \vert K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}) \end{align} \] for \( K>k \).

    • Under the linear-Gaussian model, we furthermore have that

    \[ \begin{align} \mathbf{E}^\mathrm{post}_{k |L} = \mathbf{E}^\mathrm{post}_{k|L-1}\boldsymbol{\Psi}_{L}. \end{align} \]

  • A retrospective smoothing analysis can be performed on all past states stored in memory by using the latest right-transform update from the filtering step.

  • This form of retrospective analysis is the basis of the classic ensemble Kalman smoother (EnKS).9

9. Evensen, G., & Van Leeuwen, P. J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128(6), 1852-1867.

The ETKS

  • The ensemble transform Kalman smoother (ETKS) takes advantage of the simple form of the retrospective, right-transform analysis by including an additional, inner loop of the filtering cycle.
Diagram of the filter observation-analysis-forecast cycle.
  • In the above, time is the horizontal axis where right moves forward in time.
  • At each time, we produce the standard filtering estimate by computing \( \boldsymbol{\Psi}_L \) from the cost function, and updating the forecast \[ \mathbf{E}^\mathrm{filt}_L = \mathbf{E}_L^\mathrm{fore} \boldsymbol{\Psi}_L. \]
  • The information from incoming observations is passed backward in time using the right-transform to condition the ensemble at past times: \[ \begin{align} \mathbf{E}^\mathrm{post}_{k|L} = \mathbf{E}^\mathrm{post}_{k|L-1} \boldsymbol{\Psi}_L. \end{align} \]

The ETKS

  • The information in the posterior estimate thus flows in reverse time to the lagged states stored in memory, but the information flow is unidirectional in this scheme.

  • The improved posterior estimate for the lagged states does not improve the filtering estimate in the perfect, linear-Gaussian model:

    \[ \begin{align} \mathbf{M}_{k} \mathbf{E}_{k-1|L}^\mathrm{post}& = \mathbf{M}_{k} \mathbf{E}_{k-1|k-1}^\mathrm{post} \prod_{j=k}^{L} \boldsymbol{\Psi}_j \\ & =\mathbf{E}_{k|k}^\mathrm{post}\prod_{j=k+1}^L \boldsymbol{\Psi}_j\\ & = \mathbf{E}_{k|L}^\mathrm{post}. \end{align} \]

  • This demonstrates that the effect of the conditioning on the information from the observation is covariant with the dynamics.

  • In fact, in the case of the perfect, linear-Gaussian model, the backward-in-time conditioning is equivalent to the reverse time model forecast \( \mathbf{M}_k^{-1} \), as demonstrated by Rauch, Tung and Striebel (RTS).10

10. Jazwinski, A. H. (2007). Stochastic processes and filtering theory. Courier Corporation. See example 7.8, chapter 7.

The 4D cost function

  • The conditioning is not covariant with the model dynamics model dynamics in the presence of nonlinearity or model error.
  • Re-initializing the DA cycle in a perfect, nonlinear model with the smoothed conditional ensemble estimate \( \mathbf{E}^\mathrm{post}_{0|L} \) can dramatically improve the performance of the subsequent forecast and filtering statistics.
  • Let us denote the composition of the model forecast as, \[ \begin{align} \mathcal{M}_{k:m} : = \mathcal{M}_k \circ \cdots \circ \mathcal{M}_{m} & & \mathbf{M}_{k:m} := \mathbf{M}_{k}\cdots \mathbf{M}_{m} \end{align} \]
  • Particularly, this exploits the miss-match in perfect, nonlinear dynamics between \[ \begin{align} \mathcal{M}_{L:1}\left(\mathbf{E}_{0|L}^\mathrm{post}\right):= \mathcal{M}_L \circ \cdots \circ \mathcal{M}_1\left(\mathbf{E}_{0|L}^\mathrm{post}\right) \neq \mathbf{E}_{L}^\mathrm{filt}. \end{align} \]
  • The effectiveness of the linear-Gaussian approximation strongly depends on the length of the forecast window \( \Delta t \);
    • for sufficiently small \( \Delta t \), the densities are well-approximated with Gaussians, yet there are deformations induced due to nonlinearity.
  • If the dynamics for the evolution of the densities are weakly nonlinear, re-initializing the model forecast with the posterior estimate can bring new information into the forecast states in the next DAW.
  • This has been exploited to a great extent by utilizing the 4D-cost function,11 in which the filtering MAP cost function is extended over multiple observations simultaneously, and can be written in terms of a lagged state directly.
11. Hunt, B.R., et al. (2004). Four-dimensional ensemble Kalman filtering. Tellus A: Dynamic Meteorology and Oceanography, 56(4), 273-277.

The 4D cost function

  • Suppose now we want to write \( p(\pmb{x}_{1:L}\vert \pmb{y}_{1:L}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.

  • We will suppose that there is an arbitrary shift \( S\geq 1 \).

  • Using a Bayesian analysis like before, we can write

    \[ \begin{align} {\color{#d95f02}{p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L})}} &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{{\color{#d95f02}{p(\pmb{x}_0 \vert \pmb{y}_{0:L-1})}}}_{(1)} \underbrace{{\color{#7570b3}{\left[\prod_{k=L-S+1}^Lp(\pmb{y}_k \vert \pmb{x}_k)\right]}}}_{(2)} \underbrace{{\color{#1b9e77}{\left[\prod_{k=1}^L \pmb{\delta}\left\{\pmb{x}_k - \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right)\right\}\right]}}}_{(3)} \end{align} \] where

    1. is the marginal for \( \pmb{x}_0 \) of the last joint smoothing smoothing posterior \( p(\pmb{x}_{0:L-1}\vert\pmb{x}_{0:L-1}) \);
    2. is the joint likelihood of the incoming observations to the current DAW, given the background forecast;
    3. is the free-forecast with the perfect model \( \mathcal{M}_k \).
  • Using the fact that \( p(\pmb{x}_{1:L} \vert \pmb{y}_{1:L} ) \propto p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) \), we can chain together a recursive fixed-lag smoother, sequentially across DAWs.

The 4D cost function

  • Under the linear-Gaussian assumption, the resulting cost function takes the form

    \[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_{0:L-1}^\mathrm{post} - \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w}- \hat{\pmb{x}}^\mathrm{post}_{0:L-1} \parallel_{\mathbf{P}^\mathrm{post}_{0|L-1}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} }\hat{\pmb{x}}^\mathrm{post}_{0:L-1} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w} \parallel_{\mathbf{R}_L}^2 } } \\ \Leftrightarrow& & \widetilde{\mathcal{J}}(\pmb{w}) &= \frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \hat{\pmb{\delta}}_k - \mathbf{S}_k \pmb{w}\parallel^2 \end{alignat} \] where the weights \( \pmb{w} \) give the update to the initial state \( \pmb{x}^\mathrm{post}_{0|L-1} \).

  • In the linear-Gaussian case, the solution can again be found by a single iteration of Newton's descent for the cost function above,

    \[ \begin{alignat}{2} \overline{\pmb{w}} &= 0 - \mathbf{H}_{\widetilde{\mathcal{J}}}^{-1}\nabla_{\pmb{w}} \widetilde{\mathcal{J}} ; \quad\quad && \mathbf{T} &= \mathbf{H}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}}; \\ \hat{\pmb{x}}^\mathrm{post}_{0|L}&= \hat{\pmb{x}}_{0|L-1}^\mathrm{post} + \mathbf{X}_{0|L-1}^\mathrm{post}\overline{\pmb{w}};\quad \quad&& \mathbf{P}_{0|L}^\mathrm{post} &= \left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)\left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)^\top / \left(N_e - 1\right). \end{alignat} \]

The 4D cost function

  • When the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function can be solved iteratively to find local minimum with a recursive Newton step as on the last slide.

  • This approach is at the basis of the iterative ensemble Kalman smoother (IEnKS),12 which produces a more accurate solution than the linear approximation when the models are highly nonlinear;

    • this also allows an easy generalization to optimizing hyper-parameters for the empirical density represented by the ensemble as a Gaussian mixture.
  • Ensemble Gaussian mixture models as above have been used variously to:

    • correct for sampling error due to nonlinearity,13
    • correct model error in the form of stochasticity and parametric error14 and
    • optimize localization coefficients for analyzing the spatially extended state.15
12. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
13. Bocquet, M., Raanes, P., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.
14. Raanes, P. N., Bocquet, M., & Carrassi, A. (2019). Adaptive covariance inflation in the ensemble Kalman filter by Gaussian scale mixtures. Quarterly Journal of the Royal Meteorological Society, 145(718), 53-75.
15. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.

The single-iteration ensemble transform Kalman smoother (SIETKS)

  • Every iteration of the 4D cost function comes at the cost of the ensemble forecast.

  • In synoptic meteorology, e.g., \( \Delta t \approx 6 \) hours, the linear-Gaussian approximation of the evolution of the densities is often an adequate approximation;

    • iterating over the nonlinear dynamics may not be justified by the improvement in the forecast statistics.
  • However, the iterative optimization over hyper-parameters in the filtering step of the classic ETKS can be run without the additional cost of model forecasts.

    • This is likewise the case for a nonlinear observation operator in the filtering step.
  • A right transform \( \boldsymbol{\Psi}_L \) can be found with iterative optimization of the filtering cost function

    \[ \begin{align} {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{align} \] with cost expanding in the eigen value decomposition of the Hessian \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).

  • Subsequently, the retrospective analysis in the form of the filtering right-transform can be applied to condition the initial ensemble

    \[ \begin{align} \mathbf{E}^\mathrm{post}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{post} \boldsymbol{\Psi}_L \end{align} \]

  • We initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.

    • This leads to a single iteration of the model forecast \( {\color{#1b9e77} {\mathcal{M}_{L:1} } } \), while still optimizing over the nonlinear filtering cost function.

The single-iteration ensemble transform Kalman smoother (SIETKS)

  • Compared to the classic ETKS, this adds an outer-loop to the filtering cycle to produce the posterior, smoothing analysis.
Diagram of the filter observation-analysis-forecast cycle.
  • The information flows from the filtered state back in time from the retrospective analysis.
  • This re-analyzed state becomes the initialization for the next cycle over the shifted DAW, carrying this information forward.
  • The iterative cost function is only solved in the filtering estimate for the new observations entering the DAW.
    • Combined with the retrospective analysis, this comes without the cost of iterating the model forecast over the DAW.

The single-iteration ensemble transform Kalman smoother (SIETKS)

  • By framing the development of this method in the Bayesian MAP formalism for fixed-lag smoothing we can discuss its novelty.
  • Similar to the IEnKS, this estimates the joint posterior over the DAW via the marginal for \( \pmb{x}_{0|L-1}^\mathrm{post} \).
  • However, instead of solving the 4D cost function, this sequentially solves the filtering cost function and applies a retrospective analysis.
  • A similar retrospective analysis and re-initialization has been performed in some notable examples:
    • The Running In Place (RIP) smoother16
      • The RIP is an iterative scheme over the model forecast, used to spin up the LETKF;
      • this uses a lag of \( L=1 \) and multiple iterations of the filtering step to cold-start a filtering cycle.
    • The One Step Ahead (OSA) smoother17
      • The OSA is a single iteration scheme, using a lag of \( L=1 \) and performing a retrospective analysis followed by a second analysis of the re-initialized state.
      • Without model error, the second analysis of the re-initialized state reduces to the free forecast forward-in-time.
  • With a single iteration, a perfect model and a lag of \( L=1 \) all these schemes coincide.
  • SIETKS generalizes these ideas for arbitrary lags \( L>1 \) and shifts \( S\leq L \), MDA in perfect models and iterative optimization of the filtering cost function alone for ensemble hyper-parameters.
16. Kalnay, E., & Yang, S. C. (2010). Accelerating the spin‐up of ensemble Kalman filtering. Quarterly Journal of the Royal Meteorological Society, 136(651), 1644-1651.
17. Desbouvries, F., Petetin, Y., & Ait-El-Fquih, B. (2011). Direct, prediction-and smoothing-based Kalman and particle filter algorithms. Signal processing, 91(8), 2064-2077.

Single-layer Lorenz-96 model, linear observations

  • We simulate the performance of the SIETKS versus:
    • the classic ETKS,
    • the “linear” IEnKS (LIEnKS) using a single iteration, and
    • the fully iterative IEnKS.
  • Our study system is the \( N_x=40 \) dimensional, single-layer Lorenz-96 model18.
    • The system is fully observed with observation error \[ \boldsymbol{\epsilon}_k \sim N(\pmb{0}, \mathbf{I}_{40}) \]
    • The forecast window is defined as \( \Delta t = 0.05 \), comparable to a 6 hour atmospheric forecast range.
    • The observation map will be defined as \( \mathcal{H}(\pmb{x}) = \pmb{x} \).
    • Tuned multiplicative inflation is selected for minimum smoothing RMSE, with discretization \( \lambda \in \{1.0 + j \times 0.01\}_{j=0}^{10} \).
  • The ensemble size ranges \( N_e \in \{15, 17, \cdots, 41, 43\} \), where the unstable-neutral dimension of the model is \( n_0 = 14 \), making localization unnecessary.
  • Adaptive inflation is implemented with the finite-size formalism,19 with schemes denoted with “-N”.
  • RMSE and spread are taken as an average over \( 2\times 10^{4} \) observation times;
  • an additional \( 5\times 10^{3} \) observation times are discarded as an initial spin to the steady state statistics.
18. Lorenz, E. N. (1996, September). Predictability: A problem partly solved. In Proc. Seminar on predictability (Vol. 1, No. 1).
19. Bocquet, M., Raanes, P. N., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.

Numerical demonstrations – tuned inflation

Heat plot.

Numerical demonstrations – adaptive inflation

Heat plot.

Numerical demonstrations – tuned inflation

line.

Numerical demonstrations – adaptive inflation

line.

Single-layer Lorenz-96 model, nonlinear observations

  • For a nonlinear observation map in the Lorenz-96 model, we choose

    \[ \begin{align} \mathcal{H}(\pmb{x}) = \frac{\pmb{x}}{2} \circ\left[ \pmb{1} + \left(\frac{\vert \pmb{x}\vert}{10} \right)^{1-\gamma} \right] \end{align} \]

    where:

    • \( \circ \) denotes the Hadamard product;
    • \( \gamma=1 \) defines a linear map; and
    • the nonlinearity of the map \( \mathcal{H} \) increases for larger \( \gamma \).20
  • The ETKF Newton step can be applied iteratively as an extension of the maximum likelihood ensemble filter (MLEF).21

  • Both the ETKS and the SIETKS can include an iterative analysis of the filtering cost function directly in their filtering steps.

  • The (L)IEnKS does not need be modified in any way to handle the nonlinear observation operator, but the LIEnKS will again only use a single iteration of the 4D cost function.

  • We compare the performance of these schemes versus \( \gamma \in [1.0, 11.0] \), with SDA and MDA.

20. Asch, M., Bocquet, M., & Nodet, M. (2016). Data assimilation: methods, algorithms, and applications. Society for Industrial and Applied Mathematics. See section 6.7.
21.Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly Weather Review, 133:1710–1726,

Numerical demonstrations – nonlinear observations (SDA)

Heat plot.

Numerical demonstrations – nonlinear observations (MDA)

Heat plot.

Numerical demonstrations – nonlinear observations (SDA)

Heat plot.

Numerical demonstrations – nonlinear observations (MDA)

Heat plot.

Numerical demonstrations – lag versus shift (SDA)

Heat plot.

Numerical demonstrations – lag versus shift (MDA)

Heat plot.

Numerical demonstrations – lag versus shift (SDA)

Heat plot.

Numerical demonstrations – lag versus shift (SDA)

Heat plot.

Summary of results and open questions

  • The single-iteration ensemble transform Kalman smoother (SIETKS) is an outer loop optimization of iterative, fixed-lag smoothing for weakly nonlinear forecast error dynamics.

  • This uses the lagged, marginal posterior of 4D iterative schemes, but iterates in the filtering step of the classic EnKS.

    • This transform is then applied as a retrospective analysis on the states at the beginning of the DAW.
  • The SIETKS sequentially solves the filtering cost function instead of the 4D cost function.

    • One benefit is that iteratively solving the filtering cost function does not depend on the nonlinear forecast model \( \mathcal{M}_{L:1} \)
    • This provided a more precise forecast with adaptive inflation than the LIEnKS, in the weakly nonlinear regime.
    • Another benefit is that this makes long MDA windows more stable than the LIEnKS.
    • This is more costly than LIEnKS, but the cost expands in eigen value decompositions of the Hessian \( \mathbf{H}_{\widetilde{\mathcal{J}}}\in \mathbb{R}^{N_e \times N_e} \).
    • The fully iterative IEnKS expands in the cost of ensemble simulations over the DAW.
  • Key open questions include:

    • can single-iteration smoothing be effectively combined with localization / hybridization as in an operational setting?
    • what is the general Bayesian formulation for single-iteration smoothing with stochastic and parametric model error?
  • Results in this presentation are projected to be submitted to an open access journal late June / July for open review and discussion.22

22. Grudzien, C. & Bocquet, M. (2021). A fast, single-iteration ensemble Kalman smoother for online, sequential data assimilation. In Preparation.