Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Statistical smoothing with retrospective analysis
- Retrospective smoothing in perfect models
- Global smoothing in perfect models
- Sequential smoothing as filtering

As discussed in the case of joint state-parameter estimation, there are many times when we want to

**estimate a past state****with future information**.Using the future information, we can often

**recover a more accurate estimate of the past state**, where initial conditions are distinguished by their time series of evolution.We suppose that there is a fixed lag length of time \( L \) that determines the time series of information under consideration

\[ \begin{align} \pmb{y}_{L:1} := \{\pmb{y}_1, \pmb{y}_2, \cdots, \pmb{y}_L\}. \end{align} \]

The time indices \( t_1, \cdots, t_L \) correspond to what is known as a

**data assimilation window (DAW)**, with the convention that there is no observation at time \( t_0 \).In a

**Bayesian view**, the smoothing problem has two distinct formulations, i.e., we may estimate either or both of\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) & & \text{ or } & & p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \end{align} \] for arbitrary \( k \leq l \).

The density \( p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \) is known as the

**joint posterior over the DAW**;- respectively, \( p(\pmb{x}_k | \pmb{y}_{L:1}) \) is known as the
**marginal smoothing density**, related as

\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) := \int \mathrm{d}\pmb{x}_{L:k+1,k-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}), \end{align} \] by

**integrating out the other lagged states**.- respectively, \( p(\pmb{x}_k | \pmb{y}_{L:1}) \) is known as the

Notice then, the

**filtering density**\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}) = \int \mathrm{d} \pmb{x}_{L-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}) \end{align} \] is a

**marginal smoothing density of the joint posterior**for \( \pmb{x}_L \), the current time.Therefore, the

**joint posterior**can be considered a**more general formulation of the filtering problem**.Actually, by not marginalizing out all past information (taking an average) the joint posterior (smoothing) density tends to be more accurate for the estimate of the current state in practice.

- The issue lies in that for large lags \( L \), we won't be even be able to store all lagged states in memory.

However, there are various ways that one can balance the cost of memory and computation for the more accurate smoothing problem, with e.g., fixed-lag smoothing.

We will consider several formulations of smoothing, where the smoothing is performed entirely offline for a fixed time series \( \pmb{y}_{L:1} \);

- likewise, we will consider when this is performed sequentially in time, over moving windows of fixed length \( L \), e.g., \( \pmb{y}_{L+k:k+1} \) where \( k \) varies, like a filtering problem itself.

We will start with the

**marginal smoothing problem**, where a classical Bayesian analysis leads to the**Rauch-Tung-Striebel (RTS) smoother**.This formulation reveals some important properties about the smoothing problem, which we can exploit in different circumstances.

Let's suppose that we

**have already obtained the standard filtering density**, i.e.,\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}); \end{align} \]

we will want to

**derive a recursive solution**like the filter solution to give a retrospective analysis of the last time,\[ \begin{align} p(\pmb{x}_{L:L-1}|\pmb{y}_{L:1}). \end{align} \]

We write

\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L:1})}{p(\pmb{y}_{L:1})} \\ &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}) p(\pmb{y}_{L-1:1})}{p(\pmb{y}_{L:1})}\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}). \end{align} \]

From the last slide we had the proportionality that, with respect to

**Bayes' law**, gives\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1})\\ & \propto p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1}) p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1}). \end{align} \]

However, the

**Gauss-Markov model**assumption implies that each of the following hold\[ \begin{align} p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1})&=p(\pmb{y}_{L}| \pmb{x}_{L})\\ p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1})&= p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}). \end{align} \]

Therefore, we have that,

\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{y}_{L}| \pmb{x}_{L})p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}) \end{align} \]

where the

**joint marginal**for \( \pmb{x}_{L:L-1|L} \) is given in terms of the**likelihood of the data**given the state at time \( t_L \), i.e., \( p(\pmb{y}_{L}| \pmb{x}_{L}) \); and- the
**transition probability**for \( \pmb{x}_{L|L-1} \)**given the last filtered estimate**\( p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}) \).

Note, we are assuming that we have already completed a filtering cycle to obtain the conditional mean \( \overline{\pmb{x}}_{L|L} \).

Therefore, the

**Bayesian maximum-a-posteriori estimate**for \( \pmb{x}_{L-1|L} \)**minimizes the following objective function**,\[ \begin{align} \mathcal{J}(\pmb{x}):= \frac{1}{2}\parallel \overline{\pmb{x}}_{L-1|L-1} - \pmb{x} \parallel^2_{\mathbf{B}_{L-1|L-1}} + \frac{1}{2}\parallel \overline{\pmb{x}}_{L|L} - \mathbf{M}_{L}\pmb{x} \parallel^2_{\mathbf{Q}_{L}}, \end{align} \]

where this measures

- the
**discrepancy from the previous posterior**relative to the previous posterior uncertainty; and - the
**discrepancy of the model evolution**of the choice of state from the next posterior, relative to the model uncertainty.

- the
This is a quadratic cost function, so that with the standard arguments, and matrix inversion lemmas, we find that

\[ \begin{align} \overline{\pmb{x}}_{L-1|L} &= \overline{\pmb{x}}_{L-1|L-1} + \mathbf{S}_{L}\left(\overline{\pmb{x}}_{L|L} - \mathbf{M}_L \overline{\pmb{x}}_{L-1|L-1} \right) \\ \\ \mathbf{S}_L& := \mathbf{B}_{L-1|L-1} \mathbf{M}_L^\top \mathbf{B}_{L|L}^{-1}\\ &= \mathrm{cov}\left(\pmb{x}_{L-1|L-1}, \mathbf{M}_L \pmb{x}_{L-1|L-1}\right)\left[\mathrm{cov}(\pmb{x}_{L|L}) \right]^{-1} \end{align} \]

which can be recognized directly as the

**BLUE estimate for the last state**, regressed upon with respect to the current posterior.Therefore, we obtain the

**recursion for the mean conditional on the future information**via the above, which can be generalized for any past state.

- Continuing in this way, one can compute the covariance directly to find,
\[ \begin{align}
\mathbf{B}_{L-1|L} = \mathbf{B}_{L-1|L-1} + \mathbf{S}_{L}\left(\mathbf{B}_{L|L} - \mathbf{B}_{L|L-1} \right)\mathbf{S}_{L}^\top,
\end{align} \]
so that the
**posterior covariances can be computed**directly via a**backward pass with regression**as well.

- Once again, these recursions generalize for arbitrary steps back in time;
- therefore, one recovers an estimation / re-analysis cycle with the RTS smoother as in the following diagram:

Courtesy of: Raanes, P. N. (2016). On the ensemble Rauch‐Tung‐Striebel smoother and its equivalence to the ensemble Kalman smoother. QJRMS.

- Time moves forward from left to right in the horizontal axis, where:
- a
**filtering step**runs sequentially and recursively over new observation data; and - a
**backwards pass of the retrospective analysis**using the transforms above estimates the marginal posterior densities \[ \begin{align} p(\pmb{x}_k| \pmb{y}_{L:1}). \end{align} \] - A key part of this recursion is that, if the
**model is perfect**, i.e., \( \pmb{w}_k \equiv \pmb{0} \) at all times, the**backward analysis is simply a hind-cast estimate**, \[ \begin{align} \overline{\pmb{x}}_{k|L} &= \mathbf{M}_{L:k+1}^{-1} \overline{\pmb{x}}_{L|L} \\ \mathbf{B}_{k|L} &= \mathbf{M}_{L:k+1}^{-1} \mathbf{B}_{L|L}\mathbf{M}_{L:k+1}^{-\top} \end{align} \]

- That is, this simply reduces to the reverse-time model of the posterior mean and covariance in order to directly interpolate the joint posterior over all past times.

Note, while it is an extremely important theoretical idea that the hind-cast of the posterior can be used to give a smoothed prior;

**reverse-time modeling and simulation**is not feasible in most scenarios, where time-reversal can**lead to many numerical instabilities**.

Therefore, an important reduction to the retrospective analysis in perfect models comes as follows.

Recall the

**square root Kalman filter equations**, but using the Newton-based recursion,\[ \begin{align} \overline{\pmb{x}}_{k|k} &= \overline{\pmb{x}}_{k|k-1} + \boldsymbol{\Sigma}_{k|k-1}\overline{\pmb{w}}_k\\ \boldsymbol{\Sigma}_{k|k} &= \boldsymbol{\Sigma}_{k|k-1}\mathbf{T}_k \mathbf{U}\\ \mathbf{T}_k&:= \mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} \end{align} \] where \( \mathbf{H}_{\mathcal{J}} \) is the Hessian of the weight-space objective function.

It follows readily from a similar analysis that one can write,

\[ \begin{align} \overline{\pmb{x}}_{k|L}& = \pmb{x}_{k|L-1} + \boldsymbol{\Sigma}_{L|L-1} \overline{\pmb{w}}_L,\\ \mathbf{B}_{k|L} &= \mathbf{B}_{k|L-1} \mathbf{T}_L \mathbf{U}, \end{align} \]

so that when using the square root Kalman filter, one can simply make a

**backward pass over the mean and covariance**of past states with the**filter weights and right transform analysis**to condition the joint posterior.

One may also consider the

**joint posterior directly**, giving the statistical analogue of the optimization cost function.In particular, recursively applying the Markov assumption and independence assumptions, we can write

\[ \begin{align} p(\pmb{x}_{L:0} \vert \pmb{y}_{L:1})& \propto \left[ \prod_{k=1}^L p(\pmb{y}_k \vert \pmb{x}_k ) \right]\left[\prod_{k=1}^{L} p(\pmb{x}_k \vert \pmb{x}_{k-1})\right]p\left(\pmb{x}_0\right). \end{align} \]

where in the above the

**joint posterior is proportional**to- the
**product of the likelihoods of the time series data**; with - the
**product of the transition probabilities**; with - the
**prior for the initial condition**.

- the
In the case without model error, we obtain the

**traditional extended objective function**from the variational approach,\[ \begin{align} \mathcal{J}(\pmb{x}) &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\pmb{x}_k \parallel_{\mathbf{R}_k}^2\\ &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\mathbf{M}_{k:1}\pmb{x} \parallel_{\mathbf{R}_k}^2, \end{align} \] where we optimize on the initial condition.

Using the square root Newton approach, we can again find a global solution.

- We set a lag \( L \) and a shift size \( 1\leq S \leq L \) so that we produce estimates in “smoothing cycles”.
- We will perform the smoothing analysis as in the fixed DAW for any given cycle, but
- we initialize, e.g., an optimization where the background is the last smoothed estimate for time \( t_S \),
- to produce a new smoothing estimate for \( p(\pmb{x}_{L+S:S}|\pmb{y}_{L+S:S}) \).
- Subsequently, we start a new cycle by shifting this window.