A fast, single-iteration ensemble Kalman smoother for online, sequential data assimilation

Authors:

Colin Grudzien¹, Marc Bocquet²

Special Thanks:

Eric Olson¹ & Mihye Ahn¹ for HPC logistics and support

University of Nevada, Reno, Department of Mathematics and Statistics
CEREA, A joint laboratory École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France.

Outline

Motivation
A review of the ensemble transform Kalman filter (ETKF) in perfect, hidden Markov models
Fixed-lag smoothing
The classic ensemble transform Kalman smoother (ETKS)
The 4D smoothing cost function
- Iterative analysis and the iterative ensemble Kalman smoother (IEnKS)
An alternative formulation of iterative smoothing for weakly nonlinear forecast error dynamics:
- The single-iteration ensemble transform Kalman smoother (SIETKS)
Numerical simulations
- Optimizing hyper-parameters: adaptive inflation
- Nonlinear observation operators with SDA / MDA
- Varying lag / shift sizes with SDA / MDA

Motivation

Iterative, ensemble-variational methods form the basis for the state-of-the-art for nonlinear, scalable data assimilation (DA)¹
- Methods following an ensemble Kalman filter (EnKF) style analysis include the ensemble randomized maximum likelihood method (EnRML)² and the iterative ensemble Kalman smoother (IEnKS)³

These iterative, ensemble-variational methods combine the benefits of:

the high-accuracy of the iterative solution of the Bayesian maximum a posteriori (MAP) solution to the DA problem;
the relative simplicity of numerical model development and maintenance in ensemble-based DA;
the ensemble-based analysis of time-dependent errors and possibly discontinuous, physical model parameters; and
the variational analysis, optimizing hyper-parameters for, e.g., inflation⁴, localization⁵ and surrogate models⁶ to augment the estimation scheme.

1. Bannister, R. (2017). A review of operational methods of variational and ensemble‐variational data assimilation. Quarterly Journal of the Royal Meteorological Society, 143(703), 607-633.
2. Chen, Y., & Oliver, D. (2012). Ensemble randomized maximum likelihood method as an iterative ensemble smoother. Mathematical Geosciences, 44(1), 1-26.
3. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
4. Bocquet, M., Raanes, P., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.
5. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.
6. Bocquet, M., Brajard, J., Carrassi, A., & Bertino, L. (2020). Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization.Foundations of Data Science, 2(1): 55-80 (2020).

Motivation

However, a barrier to the use of iterative, ensemble-variational schemes in online, sequential DA still lies in the computational bottleneck of ensemble simulation in the nonlinear, physics-based model.
Many iterative smoothing procedures require multiple runs of the ensemble simulations to produce the forecast, filtering and re-analyzed smoothing statistics.
For online applications in which there is a short operational time-window to produce a forecast for future states, the iterative ensemble simulations may be prohibitively expensive.
Particularly in the case where nonlinearity in the DA cycle is not dominated by the forecast error dynamics,
- i.e., if the tangent-linear evolution of Gaussian error distributions is an adequate approximation,
the cost of an iterative, ensemble simulation may not be justified by the improvement in forecast accuracy.
For short-range forecasting, nonlinearity in the DA cycle may instead by dominated by:
- nonlinearity in the observation operator,
- nonlinearity in hyper-parameter optimization, and / or
- nonlinearity in temporally interpolating a re-analyzed solution over the lagged states.
For sequential forecast cycles in which the linear-Gaussian approximation of the forecast error may be appropriate, like synoptic meteorology,
- a different form of iterative, ensemble-variational smoother may have substantial advantages in a computational cost / forecast accuracy tradeoff.

Motivation

We consider a simple hybridization of the classic ensemble transform Kalman smoother (ETKS) with the iterative ensemble Kalman smoother (IEnKS) to produce a single-iteration, fixed-lag, sequential smoother.
We combine:
1. the filtering solution and retrospective analysis as in the classic EnKS; and
2. a single iteration of the ensemble simulation over the lagged states with the re-analyzed prior;
to improve the EnKF filter and forecast statistics in the subsequent DA cycle.
The resulting scheme is a “single-iteration”, ensemble Kalman smoother that sequentially solves the filtering, Bayesian MAP cost-function;
- the scheme produces the forecast, filtering and re-analyzed smoothing statistics within a single iteration of the ensemble simulation of the lagged states.
This scheme seeks to minimize the leading order cost of ensemble-variational smoothing, i.e., the ensemble simulation in the nonlinear forecast model.
- However, we are free to iteratively solve the filtering cost function for any single observation without iterations of the ensemble simulation.
This is an outer-loop optimization of the cost of fixed-lag, sequential smoothing, designed for online applications with weakly nonlinear forecast error evolution.
We will denote the general framework as a “single-iteration” smoother, while the specific implementation presented here is denoted the single-iteration ensemble transform Kalman smoother (SIETKS).

Motivation

For linear-Gaussian systems with the perfect model hypothesis, the SIETKS is a fully consistent Bayesian estimator, albeit one that uses redundant model simulations.
When the forecast error dynamics are weakly nonlinear, yet other aspects of the DA cycle are strongly nonlinear,
- e.g., the filtering cost function is nonlinear due to hyper-parameters or a nonlinear observation operator,
we demonstrate the SIETKS has predictive performance comparable with fully iterative methods.
- However, the SIETKS has a numerical cost that scales with matrix inversions in the ensemble dimension, rather than the cost of the ensemble simulation.
We furthermore derive a method of multiple data assimilation (MDA) for this hybrid smoother scheme;
- the MDA scheme handles the increasing nonlinearity of the temporal interpolation of the posterior over long lag windows or large shifts of the smoothing solution.
The result is a two-stage MDA algorithm, estimating the forecast, filtering and re-analyzed smoothing, and shifting the smoother forward in time with two sweeps of the lagged states with an ensemble simulation.
Stage:
1. samples the posterior density, balancing the MDA weights to produce the ensemble estimates;
2. shifts the window of lagged states, updating the MDA weights for the shifted window.

Perfect hidden Markov models

We use the perfect model assumption; generalizing the techniques for the presence of model errors is ongoing work.
Define a perfect physical process model and observation model, \[ \begin{align} \pmb{x}_k &= \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right) & & \pmb{x}_k\in\mathbb{R}^{N_x}\\ \pmb{y}_k &= \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k & & \pmb{y}_k\in\mathbb{R}^{N_y} \end{align} \]
Let us denote the sequence of the process model states and observation model states as, \[ \begin{align} \pmb{x}_{m:k} &:= \left\{\pmb{x}_m, \pmb{x}_{m+1}, \cdots, \pmb{x}_k\right\}, \\ \pmb{y}_{m:k} &:= \left\{\pmb{y}_m, \pmb{y}_{m+1}, \cdots, \pmb{x}_k\right\}. \end{align} \]
We assume that the sequence of observation error vectors \[ \begin{align} \{\boldsymbol{\epsilon}_k : k=1,\cdots, L\} \end{align} \] is independent in time.
In the above configuration, we will say that the transition “density” is given in terms of \[ \begin{align} p(\pmb{x}_k \vert \pmb{x}_{k-1} ) \propto \pmb{\delta} \left\{\pmb{x}_k - \mathcal{M}_k\left(\pmb{x}_{k-1}\right)\right\}\label{eq:dirac_distribution} \end{align} \] where \( \pmb{\delta} \) represents the Dirac distribution.
The filtering problem is to sequentially and recursively estimate, \[ \begin{align} p(\pmb{x}_L \vert \pmb{y}_{1:L}) \end{align} \] or some statistic of this density, given:

an initial prior \( p(\pmb{x}_0) \); and
the observation likelihoods \( p(\pmb{y}_k \vert \pmb{x}_k ) \).

This can be described as an observation-analysis-forecast cycle.

The observation-analysis-forecast cycle

Diagram of the filter observation-analysis-forecast cycle.

Using Bayes' law, this can be written as \[ \begin{align} {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)} } &\propto {\color{#7570b3} { p\left(\pmb{y}_L \vert \pmb{x}_L\right) } } {\color{#1b9e77} { p\left(\pmb{x}_L\vert \pmb{y}_{1:L-1}\right) } } \end{align} \]
When the models are linear, i.e., \[ \begin{align} \pmb{x}_k &= \mathbf{M}_k \pmb{x}_{k-1};\\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_k + \pmb{\epsilon}_k; \end{align} \]
and the error densities are Gaussian, i.e., \[ \begin{align} p(\pmb{x}_0) &= N\left(\overline{\pmb{x}}_0, \mathbf{B}_0\right);\\ p\left(\pmb{y}_k \vert \pmb{x}_k \right) &= N\left(\mathbf{H}_k\pmb{x}_k, \mathbf{R}_k\right); \end{align} \]

the posterior filtering density is Gaussian at all times, \[ {\color{#d95f02} {p\left(\pmb{x}_{L} \vert \pmb{y}_{L:1}\right)\equiv N\left(\overline{\pmb{x}}^\mathrm{filt}_k, \mathbf{B}_k^\mathrm{filt}\right)}} \] and can be computed analytically in a recursive formulation by the Kalman filter.
The ensemble Kalman filter (EnKF) uses an empirical, ensemble-based forecast mean and covariance, along with the parametric update of the Kalman filter, to approximate the Bayesian posterior.
Many different techniques have been developed to efficiently re-sample an ensemble from the posterior, using the above approximation.

The ETKF

We will consider the right ensemble transform version.
Specifically, with Gaussian error densities,

\[ \begin{align} p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) = N\left(\overline{\pmb{x}}_L^\mathrm{fore}, \mathbf{B}_L^\mathrm{fore}\right) & & p\left(\pmb{y}_L \vert \pmb{x}_L \right) = N\left(\mathbf{H}_L \pmb{x}^\mathrm{fore}_L, \mathbf{R}_L\right) \end{align} \] this can be written as a least-squares optimization as follows.
Suppose that a forecast ensemble \( \mathbf{E}_{L}^\mathrm{fore}\in \mathbb{R}^{N_x \times N_e} \) is drawn iid from \( p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) \).
Define the ensemble-based, empirical estimates,

\[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &:= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &:= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}; & \pmb{x}&:= \hat{\pmb{x}}_L^\mathrm{fore} + \mathbf{X}^\mathrm{fore}_L \pmb{w} . \end{align} \]
Maximizing the empirical posterior density is equivalent to minimizing the cost function \( \mathcal{J}(\pmb{x}) \equiv \widetilde{\mathcal{J}}(\pmb{w}) \),

\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \ll N_x \).

The ETKF

Setting the gradient \( \nabla_{\pmb{w}} \widetilde{\mathcal{J}} \) equal to zero for \( \overline{\pmb{w}} \) we find \[ \begin{align} \overline{\pmb{w}} = \pmb{0} - {\widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}}^{-1} \nabla_{\pmb{w}} \widetilde{\mathcal{J}}. \end{align} \] where \( \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}:= \nabla^2_{\pmb{w}} \widetilde{\mathcal{J}} \) is the Hessian of the cost function.
This corresponds to a single iteration of Newton’s descent algorithm, yielding \[ \overline{\pmb{x}}_L^\mathrm{filt} := \overline{\pmb{x}}_{L}^\mathrm{fore} + \mathbf{X}_L^\mathrm{fore} \overline{\pmb{w}}. \]
If we define a right-transform matrix, \[ \begin{align} \mathbf{T}:= \widetilde{\mathbf{H}}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}}, \end{align} \]
we similarly have the update for the covariance as \[ \begin{align} \mathbf{P}^\mathrm{filt}_L = \left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)\left(\mathbf{X}_L^\mathrm{fore} \mathbf{T} \right)^\top /(N_e - 1). \end{align} \]
The numerical cost of the Newton step and the transform \( \mathbf{T} \) are subordinate to the cost of the eigen value decomposition of \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
This sketches the right ensemble transform Kalman filter recursion as in the derivation of the local ensemble transform Kalman filter (LETKF).⁷

7. Hunt, B.R., Kostelich, E.J., & Szunyogh, I. (2007). Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D: Nonlinear Phenomena, 230(1-2), 112-126.

Smoothing

This recursively estimates the density of the current state, but a related question regards the past states.
E.g., the observation at time \( t_3 \) can be used to update the conditional estimate for the model state at times \( t_0 \), \( t_1 \) and \( t_2 \).
This produces a smoothing posterior estimate for the past states.
The smoothing estimate conditions all past states within the data assimilation window (DAW), i.e., over the time series of all lagged states.

A smoothing estimate may be produced in a variety of ways, exploiting different formulations of the Bayesian inference problem as a joint or marginal smoother.⁸
The implementation also depends on whether the DAW is fixed in time, or if this window is shifted in time sequentially.

For online, observation-analysis-forecast cycles, we will sequentially shift the DAW.

8. Cosme, E., Verron, J., Brasseur, P., Blum, J., & Auroux, D. (2012). Smoothing problems in a Bayesian framework and their linear Gaussian solutions. Monthly Weather Review, 140(2), 683-695.

Sequential Smoothing

In the above, we see a schematic of how the DAW is sequentially shifted in time for a general fixed-lag smoother.
Generally, we will consider a DAW of a lag length \( L \times \Delta t \), with a sequential shift of length \( S\times \Delta t \).
After the conditional estimate is formed, we will shift the DAW by \( S \times \Delta t \);

this will discard \( S \times \Delta t \) past states from the DAW, and \( S \times \Delta t \) new states and observations will enter the DAW.

In the figure above, the \( S \times \Delta t \) new states and observations that enter the DAW are filled in black.
The above fixed-lag formalism allows us to treat the smoothing problem like sequential filtering.
This can come with large benefits to the precision of estimation by utilizing a full time series to form conditional estimates;
- however, it can also become extremely costly to produce conditional estimates this way when \( S \) is small and \( L \) is large.

The ETKS

In the ETKF formalism, we can define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),

\[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where we say that the columns of the matrices are distributed iid as \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]
We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{post}_{k|k} \), where we define

\[ \begin{align} \mathbf{E}^\mathrm{post}_{k \vert K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}) \end{align} \] for \( K>k \).
- Under the linear-Gaussian model, we furthermore have that
\[ \begin{align} \mathbf{E}^\mathrm{post}_{k |L} = \mathbf{E}^\mathrm{post}_{k|L-1}\boldsymbol{\Psi}_{L}. \end{align} \]
A retrospective smoothing analysis can be performed on all past states stored in memory by using the latest right-transform update from the filtering step.
This form of retrospective analysis is the basis of the classic ensemble Kalman smoother (EnKS).⁹

9. Evensen, G., & Van Leeuwen, P. J. (2000). An ensemble Kalman smoother for nonlinear dynamics. Monthly Weather Review, 128(6), 1852-1867.

The ETKS

The ensemble transform Kalman smoother (ETKS) takes advantage of the simple form of the retrospective, right-transform analysis by including an additional, inner loop of the filtering cycle.

In the above, time is the horizontal axis where right moves forward in time.
At each time, we produce the standard filtering estimate by computing \( \boldsymbol{\Psi}_L \) from the cost function, and updating the forecast \[ \mathbf{E}^\mathrm{filt}_L = \mathbf{E}_L^\mathrm{fore} \boldsymbol{\Psi}_L. \]
The information from incoming observations is passed backward in time using the right-transform to condition the ensemble at past times: \[ \begin{align} \mathbf{E}^\mathrm{post}_{k|L} = \mathbf{E}^\mathrm{post}_{k|L-1} \boldsymbol{\Psi}_L. \end{align} \]

The ETKS

The information in the posterior estimate thus flows in reverse time to the lagged states stored in memory, but the information flow is unidirectional in this scheme.
The improved posterior estimate for the lagged states does not improve the filtering estimate in the perfect, linear-Gaussian model:

\[ \begin{align} \mathbf{M}_{k} \mathbf{E}_{k-1|L}^\mathrm{post}& = \mathbf{M}_{k} \mathbf{E}_{k-1|k-1}^\mathrm{post} \prod_{j=k}^{L} \boldsymbol{\Psi}_j \\ & =\mathbf{E}_{k|k}^\mathrm{post}\prod_{j=k+1}^L \boldsymbol{\Psi}_j\\ & = \mathbf{E}_{k|L}^\mathrm{post}. \end{align} \]
This demonstrates that the effect of the conditioning on the information from the observation is covariant with the dynamics.
In fact, in the case of the perfect, linear-Gaussian model, the backward-in-time conditioning is equivalent to the reverse time model forecast \( \mathbf{M}_k^{-1} \), as demonstrated by Rauch, Tung and Striebel (RTS).¹⁰

10. Jazwinski, A. H. (2007). Stochastic processes and filtering theory. Courier Corporation. See example 7.8, chapter 7.

The 4D cost function

The conditioning is not covariant with the model dynamics model dynamics in the presence of nonlinearity or model error.
Re-initializing the DA cycle in a perfect, nonlinear model with the smoothed conditional ensemble estimate \( \mathbf{E}^\mathrm{post}_{0|L} \) can dramatically improve the performance of the subsequent forecast and filtering statistics.
Let us denote the composition of the model forecast as, \[ \begin{align} \mathcal{M}_{k:m} : = \mathcal{M}_k \circ \cdots \circ \mathcal{M}_{m} & & \mathbf{M}_{k:m} := \mathbf{M}_{k}\cdots \mathbf{M}_{m} \end{align} \]
Particularly, this exploits the miss-match in perfect, nonlinear dynamics between \[ \begin{align} \mathcal{M}_{L:1}\left(\mathbf{E}_{0|L}^\mathrm{post}\right):= \mathcal{M}_L \circ \cdots \circ \mathcal{M}_1\left(\mathbf{E}_{0|L}^\mathrm{post}\right) \neq \mathbf{E}_{L}^\mathrm{filt}. \end{align} \]
The effectiveness of the linear-Gaussian approximation strongly depends on the length of the forecast window \( \Delta t \);

for sufficiently small \( \Delta t \), the densities are well-approximated with Gaussians, yet there are deformations induced due to nonlinearity.

If the dynamics for the evolution of the densities are weakly nonlinear, re-initializing the model forecast with the posterior estimate can bring new information into the forecast states in the next DAW.
This has been exploited to a great extent by utilizing the 4D-cost function,¹¹ in which the filtering MAP cost function is extended over multiple observations simultaneously, and can be written in terms of a lagged state directly.

11. Hunt, B.R., et al. (2004). Four-dimensional ensemble Kalman filtering. Tellus A: Dynamic Meteorology and Oceanography, 56(4), 273-277.

The 4D cost function

Suppose now we want to write \( p(\pmb{x}_{1:L}\vert \pmb{y}_{1:L}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.
We will suppose that there is an arbitrary shift \( S\geq 1 \).
Using a Bayesian analysis like before, we can write

\[ \begin{align} {\color{#d95f02}{p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L})}} &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{{\color{#d95f02}{p(\pmb{x}_0 \vert \pmb{y}_{0:L-1})}}}_{(1)} \underbrace{{\color{#7570b3}{\left[\prod_{k=L-S+1}^Lp(\pmb{y}_k \vert \pmb{x}_k)\right]}}}_{(2)} \underbrace{{\color{#1b9e77}{\left[\prod_{k=1}^L \pmb{\delta}\left\{\pmb{x}_k - \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right)\right\}\right]}}}_{(3)} \end{align} \] where
1. is the marginal for \( \pmb{x}_0 \) of the last joint smoothing smoothing posterior \( p(\pmb{x}_{0:L-1}\vert\pmb{x}_{0:L-1}) \);
2. is the joint likelihood of the incoming observations to the current DAW, given the background forecast;
3. is the free-forecast with the perfect model \( \mathcal{M}_k \).
Using the fact that \( p(\pmb{x}_{1:L} \vert \pmb{y}_{1:L} ) \propto p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) \), we can chain together a recursive fixed-lag smoother, sequentially across DAWs.

The 4D cost function

Under the linear-Gaussian assumption, the resulting cost function takes the form

\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_{0:L-1}^\mathrm{post} - \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w}- \hat{\pmb{x}}^\mathrm{post}_{0:L-1} \parallel_{\mathbf{P}^\mathrm{post}_{0|L-1}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} }\hat{\pmb{x}}^\mathrm{post}_{0:L-1} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w} \parallel_{\mathbf{R}_L}^2 } } \\ \Leftrightarrow& & \widetilde{\mathcal{J}}(\pmb{w}) &= \frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \hat{\pmb{\delta}}_k - \mathbf{S}_k \pmb{w}\parallel^2 \end{alignat} \] where the weights \( \pmb{w} \) give the update to the initial state \( \pmb{x}^\mathrm{post}_{0|L-1} \).
In the linear-Gaussian case, the solution can again be found by a single iteration of Newton's descent for the cost function above,

\[ \begin{alignat}{2} \overline{\pmb{w}} &= 0 - \mathbf{H}_{\widetilde{\mathcal{J}}}^{-1}\nabla_{\pmb{w}} \widetilde{\mathcal{J}} ; \quad\quad && \mathbf{T} &= \mathbf{H}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}}; \\ \hat{\pmb{x}}^\mathrm{post}_{0|L}&= \hat{\pmb{x}}_{0|L-1}^\mathrm{post} + \mathbf{X}_{0|L-1}^\mathrm{post}\overline{\pmb{w}};\quad \quad&& \mathbf{P}_{0|L}^\mathrm{post} &= \left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)\left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)^\top / \left(N_e - 1\right). \end{alignat} \]

The 4D cost function

When the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function can be solved iteratively to find local minimum with a recursive Newton step as on the last slide.
This approach is at the basis of the iterative ensemble Kalman smoother (IEnKS),¹² which produces a more accurate solution than the linear approximation when the models are highly nonlinear;
- this also allows an easy generalization to optimizing hyper-parameters for the empirical density represented by the ensemble as a Gaussian mixture.
Ensemble Gaussian mixture models as above have been used variously to:
- correct for sampling error due to nonlinearity,¹³
- correct model error in the form of stochasticity and parametric error¹⁴ and
- optimize localization coefficients for analyzing the spatially extended state.¹⁵

12. Bocquet, M., & Sakov, P. (2014). An iterative ensemble Kalman smoother. Quarterly Journal of the Royal Meteorological Society, 140(682), 1521-1535.
13. Bocquet, M., Raanes, P., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.
14. Raanes, P. N., Bocquet, M., & Carrassi, A. (2019). Adaptive covariance inflation in the ensemble Kalman filter by Gaussian scale mixtures. Quarterly Journal of the Royal Meteorological Society, 145(718), 53-75.
15. Lorenc, A. C. (2003). The potential of the ensemble Kalman filter for NWP—A comparison with 4D‐Var. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography, 129(595), 3183-3203.

The single-iteration ensemble transform Kalman smoother (SIETKS)

Every iteration of the 4D cost function comes at the cost of the ensemble forecast.
In synoptic meteorology, e.g., \( \Delta t \approx 6 \) hours, the linear-Gaussian approximation of the evolution of the densities is often an adequate approximation;
- iterating over the nonlinear dynamics may not be justified by the improvement in the forecast statistics.
However, the iterative optimization over hyper-parameters in the filtering step of the classic ETKS can be run without the additional cost of model forecasts.
- This is likewise the case for a nonlinear observation operator in the filtering step.
A right transform \( \boldsymbol{\Psi}_L \) can be found with iterative optimization of the filtering cost function

\[ \begin{align} {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{align} \] with cost expanding in the eigen value decomposition of the Hessian \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
Subsequently, the retrospective analysis in the form of the filtering right-transform can be applied to condition the initial ensemble

\[ \begin{align} \mathbf{E}^\mathrm{post}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{post} \boldsymbol{\Psi}_L \end{align} \]
We initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.
- This leads to a single iteration of the model forecast \( {\color{#1b9e77} {\mathcal{M}_{L:1} } } \), while still optimizing over the nonlinear filtering cost function.

The single-iteration ensemble transform Kalman smoother (SIETKS)

Compared to the classic ETKS, this adds an outer-loop to the filtering cycle to produce the posterior, smoothing analysis.

The information flows from the filtered state back in time from the retrospective analysis.
This re-analyzed state becomes the initialization for the next cycle over the shifted DAW, carrying this information forward.
The iterative cost function is only solved in the filtering estimate for the new observations entering the DAW.

Combined with the retrospective analysis, this comes without the cost of iterating the model forecast over the DAW.

The single-iteration ensemble transform Kalman smoother (SIETKS)

By framing the development of this method in the Bayesian MAP formalism for fixed-lag smoothing we can discuss its novelty.
Similar to the IEnKS, this estimates the joint posterior over the DAW via the marginal for \( \pmb{x}_{0|L-1}^\mathrm{post} \).
However, instead of solving the 4D cost function, this sequentially solves the filtering cost function and applies a retrospective analysis.
A similar retrospective analysis and re-initialization has been performed in some notable examples:
- The Running In Place (RIP) smoother¹⁶
- The One Step Ahead (OSA) smoother¹⁷
With a single iteration, a perfect model and a lag of \( L=1 \) all these schemes coincide.
SIETKS generalizes these ideas for arbitrary lags \( L>1 \) and shifts \( S\leq L \), MDA in perfect models and iterative optimization of the filtering cost function alone for ensemble hyper-parameters.

16. Kalnay, E., & Yang, S. C. (2010). Accelerating the spin‐up of ensemble Kalman filtering. Quarterly Journal of the Royal Meteorological Society, 136(651), 1644-1651.
17. Desbouvries, F., Petetin, Y., & Ait-El-Fquih, B. (2011). Direct, prediction-and smoothing-based Kalman and particle filter algorithms. Signal processing, 91(8), 2064-2077.

Single-layer Lorenz-96 model, linear observations

We simulate the performance of the SIETKS versus:
- the classic ETKS,
- the “linear” IEnKS (LIEnKS) using a single iteration, and
- the fully iterative IEnKS.
Our study system is the \( N_x=40 \) dimensional, single-layer Lorenz-96 model¹⁸.

The system is fully observed with observation error \[ \boldsymbol{\epsilon}_k \sim N(\pmb{0}, \mathbf{I}_{40}) \]
The forecast window is defined as \( \Delta t = 0.05 \), comparable to a 6 hour atmospheric forecast range.
The observation map will be defined as \( \mathcal{H}(\pmb{x}) = \pmb{x} \).
Tuned multiplicative inflation is selected for minimum smoothing RMSE, with discretization \( \lambda \in \{1.0 + j \times 0.01\}_{j=0}^{10} \).

The ensemble size ranges \( N_e \in \{15, 17, \cdots, 41, 43\} \), where the unstable-neutral dimension of the model is \( n_0 = 14 \), making localization unnecessary.
Adaptive inflation is implemented with the finite-size formalism,¹⁹ with schemes denoted with “-N”.
RMSE and spread are taken as an average over \( 2\times 10^{4} \) observation times;
an additional \( 5\times 10^{3} \) observation times are discarded as an initial spin to the steady state statistics.

18. Lorenz, E. N. (1996, September). Predictability: A problem partly solved. In Proc. Seminar on predictability (Vol. 1, No. 1).
19. Bocquet, M., Raanes, P. N., & Hannart, A. (2015). Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation. Nonlinear Processes in Geophysics, 22(6), 645-662.

Numerical demonstrations – tuned inflation

Numerical demonstrations – adaptive inflation

Numerical demonstrations – tuned inflation

Numerical demonstrations – adaptive inflation

Single-layer Lorenz-96 model, nonlinear observations

For a nonlinear observation map in the Lorenz-96 model, we choose

\[ \begin{align} \mathcal{H}(\pmb{x}) = \frac{\pmb{x}}{2} \circ\left[ \pmb{1} + \left(\frac{\vert \pmb{x}\vert}{10} \right)^{1-\gamma} \right] \end{align} \]

where:
- \( \circ \) denotes the Hadamard product;
- \( \gamma=1 \) defines a linear map; and
- the nonlinearity of the map \( \mathcal{H} \) increases for larger \( \gamma \).²⁰
The ETKF Newton step can be applied iteratively as an extension of the maximum likelihood ensemble filter (MLEF).²¹
Both the ETKS and the SIETKS can include an iterative analysis of the filtering cost function directly in their filtering steps.
The (L)IEnKS does not need be modified in any way to handle the nonlinear observation operator, but the LIEnKS will again only use a single iteration of the 4D cost function.
We compare the performance of these schemes versus \( \gamma \in [1.0, 11.0] \), with SDA and MDA.

20. Asch, M., Bocquet, M., & Nodet, M. (2016). Data assimilation: methods, algorithms, and applications. Society for Industrial and Applied Mathematics. See section 6.7.
21.Zupanski, M. (2005). Maximum likelihood ensemble filter: Theoretical aspects. Monthly Weather Review, 133:1710–1726,

Numerical demonstrations – nonlinear observations (SDA)

Numerical demonstrations – nonlinear observations (MDA)

Numerical demonstrations – nonlinear observations (SDA)

Numerical demonstrations – nonlinear observations (MDA)

Numerical demonstrations – lag versus shift (SDA)

Numerical demonstrations – lag versus shift (MDA)

Numerical demonstrations – lag versus shift (SDA)

Summary of results and open questions

The single-iteration ensemble transform Kalman smoother (SIETKS) is an outer loop optimization of iterative, fixed-lag smoothing for weakly nonlinear forecast error dynamics.
This uses the lagged, marginal posterior of 4D iterative schemes, but iterates in the filtering step of the classic EnKS.
- This transform is then applied as a retrospective analysis on the states at the beginning of the DAW.
The SIETKS sequentially solves the filtering cost function instead of the 4D cost function.
- One benefit is that iteratively solving the filtering cost function does not depend on the nonlinear forecast model \( \mathcal{M}_{L:1} \)
- This provided a more precise forecast with adaptive inflation than the LIEnKS, in the weakly nonlinear regime.
- Another benefit is that this makes long MDA windows more stable than the LIEnKS.
- This is more costly than LIEnKS, but the cost expands in eigen value decompositions of the Hessian \( \mathbf{H}_{\widetilde{\mathcal{J}}}\in \mathbb{R}^{N_e \times N_e} \).
- The fully iterative IEnKS expands in the cost of ensemble simulations over the DAW.
Key open questions include:
- can single-iteration smoothing be effectively combined with localization / hybridization as in an operational setting?
- what is the general Bayesian formulation for single-iteration smoothing with stochastic and parametric model error?
Results in this presentation are projected to be submitted to an open access journal late June / July for open review and discussion.²²

22. Grudzien, C. & Bocquet, M. (2021). A fast, single-iteration ensemble Kalman smoother for online, sequential data assimilation. In Preparation.