Colin Grudzien1, Marc Bocquet2
Eric Olson1 & Mihye Ahn1 for HPC logistics and support
Iterative, ensemble-variational methods form the basis for the state-of-the-art for nonlinear, scalable data assimilation (DA)1
However, a barrier to the use of iterative, ensemble-variational schemes in online, sequential DA still lies in the computational bottleneck of ensemble simulation in the nonlinear, physics-based model.
Many iterative smoothing procedures require multiple runs of the ensemble simulations to produce the forecast, filtering and re-analyzed smoothing statistics.
For online applications in which there is a short operational time-window to produce a forecast for future states, the iterative ensemble simulations may be prohibitively expensive.
Particularly in the case where nonlinearity in the DA cycle is not dominated by the forecast error dynamics,
the cost of an iterative, ensemble simulation may not be justified by the improvement in forecast accuracy.
For short-range forecasting, nonlinearity in the DA cycle may instead by dominated by:
For sequential forecast cycles in which the linear-Gaussian approximation of the forecast error may be appropriate, like synoptic meteorology,
We consider a simple hybridization of the classic ensemble transform Kalman smoother (ETKS) with the iterative ensemble Kalman smoother (IEnKS) to produce a single-iteration, fixed-lag, sequential smoother.
We combine:
to improve the EnKF filter and forecast statistics in the subsequent DA cycle.
The resulting scheme is a “single-iteration”, ensemble Kalman smoother that sequentially solves the filtering, Bayesian MAP cost-function;
This scheme seeks to minimize the leading order cost of ensemble-variational smoothing, i.e., the ensemble simulation in the nonlinear forecast model.
This is an outer-loop optimization of the cost of fixed-lag, sequential smoothing, designed for online applications with weakly nonlinear forecast error evolution.
We will denote the general framework as a “single-iteration” smoother, while the specific implementation presented here is denoted the single-iteration ensemble transform Kalman smoother (SIETKS).
For linear-Gaussian systems with the perfect model hypothesis, the SIETKS is a fully consistent Bayesian estimator, albeit one that uses redundant model simulations.
When the forecast error dynamics are weakly nonlinear, yet other aspects of the DA cycle are strongly nonlinear,
we demonstrate the SIETKS has predictive performance comparable with fully iterative methods.
We furthermore derive a method of multiple data assimilation (MDA) for this hybrid smoother scheme;
The result is a two-stage MDA algorithm, estimating the forecast, filtering and re-analyzed smoothing, and shifting the smoother forward in time with two sweeps of the lagged states with an ensemble simulation.
Stage:
We will consider the right ensemble transform version.
Specifically, with Gaussian error densities,
\[ \begin{align} p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) = N\left(\overline{\pmb{x}}_L^\mathrm{fore}, \mathbf{B}_L^\mathrm{fore}\right) & & p\left(\pmb{y}_L \vert \pmb{x}_L \right) = N\left(\mathbf{H}_L \pmb{x}^\mathrm{fore}_L, \mathbf{R}_L\right) \end{align} \] this can be written as a least-squares optimization as follows.
Suppose that a forecast ensemble \( \mathbf{E}_{L}^\mathrm{fore}\in \mathbb{R}^{N_x \times N_e} \) is drawn iid from \( p(\pmb{x}_L\vert \pmb{y}_{1:L-1}) \).
Define the ensemble-based, empirical estimates,
\[ \begin{align} & & \hat{\pmb{x}}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} \pmb{1} / N_e ; & \hat{\pmb{\delta}}_L &:= \mathbf{R}_k^{-\frac{1}{2}}\left(\pmb{y}_L - \mathbf{H}_L \hat{\pmb{x}}_L\right)\\ &&\mathbf{X}_L^\mathrm{fore} &:= \mathbf{E}_L^\mathrm{fore} - \hat{\pmb{x}}^\mathrm{fore}_L \pmb{1}^\top ; & \mathbf{P}^\mathrm{fore}_L &:= \mathbf{X}_L^\mathrm{fore} \left(\mathbf{X}_L^\mathrm{fore}\right)^\top / \left(N_e - 1\right);\\ & &\mathbf{S}_L &:=\mathbf{R}_L^{-\frac{1}{2}}\mathbf{H}_L \mathbf{X}_L^\mathrm{fore}; & \pmb{x}&:= \hat{\pmb{x}}_L^\mathrm{fore} + \mathbf{X}^\mathrm{fore}_L \pmb{w} . \end{align} \]
Maximizing the empirical posterior density is equivalent to minimizing the cost function \( \mathcal{J}(\pmb{x}) \equiv \widetilde{\mathcal{J}}(\pmb{w}) \),
\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2} \parallel \hat{\pmb{x}}_L^\mathrm{fore} - \mathbf{X}^\mathrm{fore}_L \pmb{w}- \hat{\pmb{x}}^\mathrm{fore}_L \parallel_{\mathbf{P}^\mathrm{fore}_L}^2} } + {\color{#7570b3} {\frac{1}{2} \parallel \pmb{y}_L - \mathbf{H}_L\hat{\pmb{x}}^\mathrm{fore}_L - \mathbf{H}_L \mathbf{X}^\mathrm{fore}_L \pmb{w} \parallel_{\mathbf{R}_L}^2 } }\\ \Leftrightarrow& & {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{alignat} \] which is an optimization over a weight vector \( \pmb{w} \) in the ensemble dimension \( N_e \ll N_x \).
In the ETKF formalism, we can define an ensemble right-transform \( \boldsymbol{\Psi}_k \) such that for any \( t_k \),
\[ \begin{align} \mathbf{E}^\mathrm{filt}_k = \mathbf{E}^\mathrm{fore}_k \boldsymbol{\Psi}_k \end{align} \] where we say that the columns of the matrices are distributed iid as \[ \begin{align} \mathbf{E}^\mathrm{filt}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k}) \\ \mathbf{E}^\mathrm{fore}_k &\sim p(\pmb{x}_k \vert \pmb{y}_{1:k-1}) \end{align} \]
We will associate \( \mathbf{E}^\mathrm{filt}_k \equiv \mathbf{E}^\mathrm{post}_{k|k} \), where we define
\[ \begin{align} \mathbf{E}^\mathrm{post}_{k \vert K} \sim p(\pmb{x}_k \vert \pmb{y}_{1:K}) \end{align} \] for \( K>k \).
\[ \begin{align} \mathbf{E}^\mathrm{post}_{k |L} = \mathbf{E}^\mathrm{post}_{k|L-1}\boldsymbol{\Psi}_{L}. \end{align} \]
A retrospective smoothing analysis can be performed on all past states stored in memory by using the latest right-transform update from the filtering step.
This form of retrospective analysis is the basis of the classic ensemble Kalman smoother (EnKS).9
The information in the posterior estimate thus flows in reverse time to the lagged states stored in memory, but the information flow is unidirectional in this scheme.
The improved posterior estimate for the lagged states does not improve the filtering estimate in the perfect, linear-Gaussian model:
\[ \begin{align} \mathbf{M}_{k} \mathbf{E}_{k-1|L}^\mathrm{post}& = \mathbf{M}_{k} \mathbf{E}_{k-1|k-1}^\mathrm{post} \prod_{j=k}^{L} \boldsymbol{\Psi}_j \\ & =\mathbf{E}_{k|k}^\mathrm{post}\prod_{j=k+1}^L \boldsymbol{\Psi}_j\\ & = \mathbf{E}_{k|L}^\mathrm{post}. \end{align} \]
This demonstrates that the effect of the conditioning on the information from the observation is covariant with the dynamics.
In fact, in the case of the perfect, linear-Gaussian model, the backward-in-time conditioning is equivalent to the reverse time model forecast \( \mathbf{M}_k^{-1} \), as demonstrated by Rauch, Tung and Striebel (RTS).10
Suppose now we want to write \( p(\pmb{x}_{1:L}\vert \pmb{y}_{1:L}) \), the joint smoothing posterior over the current DAW, as a recursive update to the last smoothing posterior.
We will suppose that there is an arbitrary shift \( S\geq 1 \).
Using a Bayesian analysis like before, we can write
\[ \begin{align} {\color{#d95f02}{p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L})}} &\propto \int \mathrm{d}\pmb{x}_0 \underbrace{{\color{#d95f02}{p(\pmb{x}_0 \vert \pmb{y}_{0:L-1})}}}_{(1)} \underbrace{{\color{#7570b3}{\left[\prod_{k=L-S+1}^Lp(\pmb{y}_k \vert \pmb{x}_k)\right]}}}_{(2)} \underbrace{{\color{#1b9e77}{\left[\prod_{k=1}^L \pmb{\delta}\left\{\pmb{x}_k - \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right)\right\}\right]}}}_{(3)} \end{align} \] where
Using the fact that \( p(\pmb{x}_{1:L} \vert \pmb{y}_{1:L} ) \propto p(\pmb{x}_{1:L} \vert \pmb{y}_{0:L}) \), we can chain together a recursive fixed-lag smoother, sequentially across DAWs.
Under the linear-Gaussian assumption, the resulting cost function takes the form
\[ \begin{alignat}{2} & & {\color{#d95f02} {\widetilde{\mathcal{J}} (\pmb{w})} } &= {\color{#d95f02} {\frac{1}{2} \parallel \hat{\pmb{x}}_{0:L-1}^\mathrm{post} - \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w}- \hat{\pmb{x}}^\mathrm{post}_{0:L-1} \parallel_{\mathbf{P}^\mathrm{post}_{0|L-1}}^2} } + {\color{#7570b3} {\sum_{k=L-S+1}^L \frac{1}{2} \parallel \pmb{y}_k - \mathbf{H}_k {\color{#1b9e77} { \mathbf{M}_{k:1}} }\hat{\pmb{x}}^\mathrm{post}_{0:L-1} - \mathbf{H}_k {\color{#1b9e77} {\mathbf{M}_{k:1}}} \mathbf{X}^\mathrm{post}_{0:L-1} \pmb{w} \parallel_{\mathbf{R}_L}^2 } } \\ \Leftrightarrow& & \widetilde{\mathcal{J}}(\pmb{w}) &= \frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2 + \sum_{k=L-S+1}^L \frac{1}{2}\parallel \hat{\pmb{\delta}}_k - \mathbf{S}_k \pmb{w}\parallel^2 \end{alignat} \] where the weights \( \pmb{w} \) give the update to the initial state \( \pmb{x}^\mathrm{post}_{0|L-1} \).
In the linear-Gaussian case, the solution can again be found by a single iteration of Newton's descent for the cost function above,
\[ \begin{alignat}{2} \overline{\pmb{w}} &= 0 - \mathbf{H}_{\widetilde{\mathcal{J}}}^{-1}\nabla_{\pmb{w}} \widetilde{\mathcal{J}} ; \quad\quad && \mathbf{T} &= \mathbf{H}_{\widetilde{\mathcal{J}}}^{-\frac{1}{2}}; \\ \hat{\pmb{x}}^\mathrm{post}_{0|L}&= \hat{\pmb{x}}_{0|L-1}^\mathrm{post} + \mathbf{X}_{0|L-1}^\mathrm{post}\overline{\pmb{w}};\quad \quad&& \mathbf{P}_{0|L}^\mathrm{post} &= \left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)\left(\mathbf{X}^\mathrm{post}_{0|L-1}\mathbf{T}\right)^\top / \left(N_e - 1\right). \end{alignat} \]
When the state and observation model are nonlinear, using \( \mathcal{H}_k \) and \( \mathcal{M}_{k:1} \) in the cost function, this cost function can be solved iteratively to find local minimum with a recursive Newton step as on the last slide.
This approach is at the basis of the iterative ensemble Kalman smoother (IEnKS),12 which produces a more accurate solution than the linear approximation when the models are highly nonlinear;
Ensemble Gaussian mixture models as above have been used variously to:
Every iteration of the 4D cost function comes at the cost of the ensemble forecast.
In synoptic meteorology, e.g., \( \Delta t \approx 6 \) hours, the linear-Gaussian approximation of the evolution of the densities is often an adequate approximation;
However, the iterative optimization over hyper-parameters in the filtering step of the classic ETKS can be run without the additional cost of model forecasts.
A right transform \( \boldsymbol{\Psi}_L \) can be found with iterative optimization of the filtering cost function
\[ \begin{align} {\color{#d95f02} {\widetilde{\mathcal{J}}(\pmb{w})} } &= {\color{#1b9e77} {\frac{1}{2}(N_e - 1) \parallel\pmb{w}\parallel^2} } + {\color{#7570b3} {\frac{1}{2}\parallel \hat{\pmb{\delta}}_L - \mathbf{S}_L \pmb{w}\parallel^2 } } \end{align} \] with cost expanding in the eigen value decomposition of the Hessian \( \mathbf{H}_{\mathcal{J}} \in \mathbb{R}^{N_e \times N_e} \).
Subsequently, the retrospective analysis in the form of the filtering right-transform can be applied to condition the initial ensemble
\[ \begin{align} \mathbf{E}^\mathrm{post}_{0|L} = \mathbf{E}_{0:L-1}^\mathrm{post} \boldsymbol{\Psi}_L \end{align} \]
We initialize the next DA cycle in terms of the retrospective analysis, and gain the benefit of the improved initial estimate.
For a nonlinear observation map in the Lorenz-96 model, we choose
\[ \begin{align} \mathcal{H}(\pmb{x}) = \frac{\pmb{x}}{2} \circ\left[ \pmb{1} + \left(\frac{\vert \pmb{x}\vert}{10} \right)^{1-\gamma} \right] \end{align} \]
where:
The ETKF Newton step can be applied iteratively as an extension of the maximum likelihood ensemble filter (MLEF).21
Both the ETKS and the SIETKS can include an iterative analysis of the filtering cost function directly in their filtering steps.
The (L)IEnKS does not need be modified in any way to handle the nonlinear observation operator, but the LIEnKS will again only use a single iteration of the 4D cost function.
We compare the performance of these schemes versus \( \gamma \in [1.0, 11.0] \), with SDA and MDA.
The single-iteration ensemble transform Kalman smoother (SIETKS) is an outer loop optimization of iterative, fixed-lag smoothing for weakly nonlinear forecast error dynamics.
This uses the lagged, marginal posterior of 4D iterative schemes, but iterates in the filtering step of the classic EnKS.
The SIETKS sequentially solves the filtering cost function instead of the 4D cost function.
Key open questions include:
Results in this presentation are projected to be submitted to an open access journal late June / July for open review and discussion.22