Filtering, smoothing and sequential smoothing

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Statistical smoothing with retrospective analysis
- Retrospective smoothing in perfect models
- Global smoothing in perfect models
- Sequential smoothing as filtering

Motivation

As discussed in the case of joint state-parameter estimation, there are many times when we want to estimate a past state with future information.
Using the future information, we can often recover a more accurate estimate of the past state, where initial conditions are distinguished by their time series of evolution.
We suppose that there is a fixed lag length of time \( L \) that determines the time series of information under consideration

\[ \begin{align} \pmb{y}_{L:1} := \{\pmb{y}_1, \pmb{y}_2, \cdots, \pmb{y}_L\}. \end{align} \]
The time indices \( t_1, \cdots, t_L \) correspond to what is known as a data assimilation window (DAW), with the convention that there is no observation at time \( t_0 \).
In a Bayesian view, the smoothing problem has two distinct formulations, i.e., we may estimate either or both of

\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) & & \text{ or } & & p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \end{align} \] for arbitrary \( k \leq l \).
The density \( p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \) is known as the joint posterior over the DAW;
- respectively, \( p(\pmb{x}_k | \pmb{y}_{L:1}) \) is known as the marginal smoothing density, related as
\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) := \int \mathrm{d}\pmb{x}_{L:k+1,k-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}), \end{align} \] by integrating out the other lagged states.

Motivation

Notice then, the filtering density

\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}) = \int \mathrm{d} \pmb{x}_{L-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}) \end{align} \] is a marginal smoothing density of the joint posterior for \( \pmb{x}_L \), the current time.
Therefore, the joint posterior can be considered a more general formulation of the filtering problem.
Actually, by not marginalizing out all past information (taking an average) the joint posterior (smoothing) density tends to be more accurate for the estimate of the current state in practice.
- The issue lies in that for large lags \( L \), we won't be even be able to store all lagged states in memory.
However, there are various ways that one can balance the cost of memory and computation for the more accurate smoothing problem, with e.g., fixed-lag smoothing.
We will consider several formulations of smoothing, where the smoothing is performed entirely offline for a fixed time series \( \pmb{y}_{L:1} \);
- likewise, we will consider when this is performed sequentially in time, over moving windows of fixed length \( L \), e.g., \( \pmb{y}_{L+k:k+1} \) where \( k \) varies, like a filtering problem itself.

Statistical smoothing with a retrospective analysis

We will start with the marginal smoothing problem, where a classical Bayesian analysis leads to the Rauch-Tung-Striebel (RTS) smoother.
This formulation reveals some important properties about the smoothing problem, which we can exploit in different circumstances.
Let's suppose that we have already obtained the standard filtering density, i.e.,

\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}); \end{align} \]
we will want to derive a recursive solution like the filter solution to give a retrospective analysis of the last time,

\[ \begin{align} p(\pmb{x}_{L:L-1}|\pmb{y}_{L:1}). \end{align} \]
We write

\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L:1})}{p(\pmb{y}_{L:1})} \\ &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}) p(\pmb{y}_{L-1:1})}{p(\pmb{y}_{L:1})}\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}). \end{align} \]

Statistical smoothing with a retrospective analysis

From the last slide we had the proportionality that, with respect to Bayes' law, gives

\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1})\\ & \propto p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1}) p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1}). \end{align} \]
However, the Gauss-Markov model assumption implies that each of the following hold

\[ \begin{align} p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1})&=p(\pmb{y}_{L}| \pmb{x}_{L})\\ p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1})&= p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}). \end{align} \]
Therefore, we have that,

\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{y}_{L}| \pmb{x}_{L})p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}) \end{align} \]
where the joint marginal for \( \pmb{x}_{L:L-1|L} \) is given in terms of the
- likelihood of the data given the state at time \( t_L \), i.e., \( p(\pmb{y}_{L}| \pmb{x}_{L}) \); and
- the transition probability for \( \pmb{x}_{L|L-1} \) given the last filtered estimate \( p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}) \).

Statistical smoothing with a retrospective analysis

Note, we are assuming that we have already completed a filtering cycle to obtain the conditional mean \( \overline{\pmb{x}}_{L|L} \).
Therefore, the Bayesian maximum-a-posteriori estimate for \( \pmb{x}_{L-1|L} \) minimizes the following objective function,

\[ \begin{align} \mathcal{J}(\pmb{x}):= \frac{1}{2}\parallel \overline{\pmb{x}}_{L-1|L-1} - \pmb{x} \parallel^2_{\mathbf{B}_{L-1|L-1}} + \frac{1}{2}\parallel \overline{\pmb{x}}_{L|L} - \mathbf{M}_{L}\pmb{x} \parallel^2_{\mathbf{Q}_{L}}, \end{align} \]
where this measures
- the discrepancy from the previous posterior relative to the previous posterior uncertainty; and
- the discrepancy of the model evolution of the choice of state from the next posterior, relative to the model uncertainty.
This is a quadratic cost function, so that with the standard arguments, and matrix inversion lemmas, we find that

\[ \begin{align} \overline{\pmb{x}}_{L-1|L} &= \overline{\pmb{x}}_{L-1|L-1} + \mathbf{S}_{L}\left(\overline{\pmb{x}}_{L|L} - \mathbf{M}_L \overline{\pmb{x}}_{L-1|L-1} \right) \\ \\ \mathbf{S}_L& := \mathbf{B}_{L-1|L-1} \mathbf{M}_L^\top \mathbf{B}_{L|L}^{-1}\\ &= \mathrm{cov}\left(\pmb{x}_{L-1|L-1}, \mathbf{M}_L \pmb{x}_{L-1|L-1}\right)\left[\mathrm{cov}(\pmb{x}_{L|L}) \right]^{-1} \end{align} \]
which can be recognized directly as the BLUE estimate for the last state, regressed upon with respect to the current posterior.
Therefore, we obtain the recursion for the mean conditional on the future information via the above, which can be generalized for any past state.

Statistical smoothing with a retrospective analysis

Continuing in this way, one can compute the covariance directly to find, \[ \begin{align} \mathbf{B}_{L-1|L} = \mathbf{B}_{L-1|L-1} + \mathbf{S}_{L}\left(\mathbf{B}_{L|L} - \mathbf{B}_{L|L-1} \right)\mathbf{S}_{L}^\top, \end{align} \] so that the posterior covariances can be computed directly via a backward pass with regression as well.

Once again, these recursions generalize for arbitrary steps back in time;
therefore, one recovers an estimation / re-analysis cycle with the RTS smoother as in the following diagram:

Courtesy of: Raanes, P. N. (2016). On the ensemble Rauch‐Tung‐Striebel smoother and its equivalence to the ensemble Kalman smoother. QJRMS.

Time moves forward from left to right in the horizontal axis, where:
a filtering step runs sequentially and recursively over new observation data; and
a backwards pass of the retrospective analysis using the transforms above estimates the marginal posterior densities \[ \begin{align} p(\pmb{x}_k| \pmb{y}_{L:1}). \end{align} \]

A key part of this recursion is that, if the model is perfect, i.e., \( \pmb{w}_k \equiv \pmb{0} \) at all times, the backward analysis is simply a hind-cast estimate, \[ \begin{align} \overline{\pmb{x}}_{k|L} &= \mathbf{M}_{L:k+1}^{-1} \overline{\pmb{x}}_{L|L} \\ \mathbf{B}_{k|L} &= \mathbf{M}_{L:k+1}^{-1} \mathbf{B}_{L|L}\mathbf{M}_{L:k+1}^{-\top} \end{align} \]

That is, this simply reduces to the reverse-time model of the posterior mean and covariance in order to directly interpolate the joint posterior over all past times.

Retrospective smoothing in perfect models

Note, while it is an extremely important theoretical idea that the hind-cast of the posterior can be used to give a smoothed prior;
- reverse-time modeling and simulation is not feasible in most scenarios, where time-reversal can lead to many numerical instabilities.
Therefore, an important reduction to the retrospective analysis in perfect models comes as follows.
Recall the square root Kalman filter equations, but using the Newton-based recursion,

\[ \begin{align} \overline{\pmb{x}}_{k|k} &= \overline{\pmb{x}}_{k|k-1} + \boldsymbol{\Sigma}_{k|k-1}\overline{\pmb{w}}_k\\ \boldsymbol{\Sigma}_{k|k} &= \boldsymbol{\Sigma}_{k|k-1}\mathbf{T}_k \mathbf{U}\\ \mathbf{T}_k&:= \mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} \end{align} \] where \( \mathbf{H}_{\mathcal{J}} \) is the Hessian of the weight-space objective function.
It follows readily from a similar analysis that one can write,

\[ \begin{align} \overline{\pmb{x}}_{k|L}& = \pmb{x}_{k|L-1} + \boldsymbol{\Sigma}_{L|L-1} \overline{\pmb{w}}_L,\\ \mathbf{B}_{k|L} &= \mathbf{B}_{k|L-1} \mathbf{T}_L \mathbf{U}, \end{align} \]
so that when using the square root Kalman filter, one can simply make a backward pass over the mean and covariance of past states with the filter weights and right transform analysis to condition the joint posterior.

Global smoothing in perfect models

One may also consider the joint posterior directly, giving the statistical analogue of the optimization cost function.
In particular, recursively applying the Markov assumption and independence assumptions, we can write

\[ \begin{align} p(\pmb{x}_{L:0} \vert \pmb{y}_{L:1})& \propto \left[ \prod_{k=1}^L p(\pmb{y}_k \vert \pmb{x}_k ) \right]\left[\prod_{k=1}^{L} p(\pmb{x}_k \vert \pmb{x}_{k-1})\right]p\left(\pmb{x}_0\right). \end{align} \]
where in the above the joint posterior is proportional to
- the product of the likelihoods of the time series data; with
- the product of the transition probabilities; with
- the prior for the initial condition.
In the case without model error, we obtain the traditional extended objective function from the variational approach,

\[ \begin{align} \mathcal{J}(\pmb{x}) &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\pmb{x}_k \parallel_{\mathbf{R}_k}^2\\ &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\mathbf{M}_{k:1}\pmb{x} \parallel_{\mathbf{R}_k}^2, \end{align} \] where we optimize on the initial condition.
Using the square root Newton approach, we can again find a global solution.

Sequential smoothing as filtering

The methods considered already cover the idea of smoothing, when we have a fixed data assimilation window \( \pmb{y}_{L:1} \).

However, a common technique for a time-varying system is to have a window of fixed length \( L \) shift over time.

This is what is known as fixed lag smoothing.

We set a lag \( L \) and a shift size \( 1\leq S \leq L \) so that we produce estimates in “smoothing cycles”.
We will perform the smoothing analysis as in the fixed DAW for any given cycle, but
we initialize, e.g., an optimization where the background is the last smoothed estimate for time \( t_S \),
to produce a new smoothing estimate for \( p(\pmb{x}_{L+S:S}|\pmb{y}_{L+S:S}) \).
Subsequently, we start a new cycle by shifting this window.