Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
As discussed in the case of joint state-parameter estimation, there are many times when we want to estimate a past state with future information.
Using the future information, we can often recover a more accurate estimate of the past state, where initial conditions are distinguished by their time series of evolution.
We suppose that there is a fixed lag length of time \( L \) that determines the time series of information under consideration
\[ \begin{align} \pmb{y}_{L:1} := \{\pmb{y}_1, \pmb{y}_2, \cdots, \pmb{y}_L\}. \end{align} \]
The time indices \( t_1, \cdots, t_L \) correspond to what is known as a data assimilation window (DAW), with the convention that there is no observation at time \( t_0 \).
In a Bayesian view, the smoothing problem has two distinct formulations, i.e., we may estimate either or both of
\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) & & \text{ or } & & p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \end{align} \] for arbitrary \( k \leq l \).
The density \( p(\pmb{x}_{L:0} | \pmb{y}_{L:1}) \) is known as the joint posterior over the DAW;
\[ \begin{align} p(\pmb{x}_k | \pmb{y}_{L:1}) := \int \mathrm{d}\pmb{x}_{L:k+1,k-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}), \end{align} \] by integrating out the other lagged states.
Notice then, the filtering density
\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}) = \int \mathrm{d} \pmb{x}_{L-1:0} p(\pmb{x}_{L:0}|\pmb{y}_{L:1}) \end{align} \] is a marginal smoothing density of the joint posterior for \( \pmb{x}_L \), the current time.
Therefore, the joint posterior can be considered a more general formulation of the filtering problem.
Actually, by not marginalizing out all past information (taking an average) the joint posterior (smoothing) density tends to be more accurate for the estimate of the current state in practice.
However, there are various ways that one can balance the cost of memory and computation for the more accurate smoothing problem, with e.g., fixed-lag smoothing.
We will consider several formulations of smoothing, where the smoothing is performed entirely offline for a fixed time series \( \pmb{y}_{L:1} \);
We will start with the marginal smoothing problem, where a classical Bayesian analysis leads to the Rauch-Tung-Striebel (RTS) smoother.
This formulation reveals some important properties about the smoothing problem, which we can exploit in different circumstances.
Let's suppose that we have already obtained the standard filtering density, i.e.,
\[ \begin{align} p(\pmb{x}_L | \pmb{y}_{L:1}); \end{align} \]
we will want to derive a recursive solution like the filter solution to give a retrospective analysis of the last time,
\[ \begin{align} p(\pmb{x}_{L:L-1}|\pmb{y}_{L:1}). \end{align} \]
We write
\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L:1})}{p(\pmb{y}_{L:1})} \\ &= \frac{p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}) p(\pmb{y}_{L-1:1})}{p(\pmb{y}_{L:1})}\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1}). \end{align} \]
From the last slide we had the proportionality that, with respect to Bayes' law, gives
\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{x}_{L:L-1}, \pmb{y}_{L}| \pmb{y}_{L-1:1})\\ & \propto p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1}) p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1}). \end{align} \]
However, the Gauss-Markov model assumption implies that each of the following hold
\[ \begin{align} p(\pmb{y}_{L}|\pmb{x}_{L:L-1}, \pmb{y}_{L-1:1})&=p(\pmb{y}_{L}| \pmb{x}_{L})\\ p(\pmb{x}_{L:L-1}|\pmb{y}_{L-1:1})&= p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}). \end{align} \]
Therefore, we have that,
\[ \begin{align} p(\pmb{x}_{L:L-1}| \pmb{y}_{L:1}) &\propto p(\pmb{y}_{L}| \pmb{x}_{L})p(\pmb{x}_{L}|\pmb{x}_{L-1})p(\pmb{x}_{L-1}|\pmb{y}_{L-1:1}) \end{align} \]
where the joint marginal for \( \pmb{x}_{L:L-1|L} \) is given in terms of the
Note, we are assuming that we have already completed a filtering cycle to obtain the conditional mean \( \overline{\pmb{x}}_{L|L} \).
Therefore, the Bayesian maximum-a-posteriori estimate for \( \pmb{x}_{L-1|L} \) minimizes the following objective function,
\[ \begin{align} \mathcal{J}(\pmb{x}):= \frac{1}{2}\parallel \overline{\pmb{x}}_{L-1|L-1} - \pmb{x} \parallel^2_{\mathbf{B}_{L-1|L-1}} + \frac{1}{2}\parallel \overline{\pmb{x}}_{L|L} - \mathbf{M}_{L}\pmb{x} \parallel^2_{\mathbf{Q}_{L}}, \end{align} \]
where this measures
This is a quadratic cost function, so that with the standard arguments, and matrix inversion lemmas, we find that
\[ \begin{align} \overline{\pmb{x}}_{L-1|L} &= \overline{\pmb{x}}_{L-1|L-1} + \mathbf{S}_{L}\left(\overline{\pmb{x}}_{L|L} - \mathbf{M}_L \overline{\pmb{x}}_{L-1|L-1} \right) \\ \\ \mathbf{S}_L& := \mathbf{B}_{L-1|L-1} \mathbf{M}_L^\top \mathbf{B}_{L|L}^{-1}\\ &= \mathrm{cov}\left(\pmb{x}_{L-1|L-1}, \mathbf{M}_L \pmb{x}_{L-1|L-1}\right)\left[\mathrm{cov}(\pmb{x}_{L|L}) \right]^{-1} \end{align} \]
which can be recognized directly as the BLUE estimate for the last state, regressed upon with respect to the current posterior.
Therefore, we obtain the recursion for the mean conditional on the future information via the above, which can be generalized for any past state.
Courtesy of: Raanes, P. N. (2016). On the ensemble Rauch‐Tung‐Striebel smoother and its equivalence to the ensemble Kalman smoother. QJRMS.
Note, while it is an extremely important theoretical idea that the hind-cast of the posterior can be used to give a smoothed prior;
Therefore, an important reduction to the retrospective analysis in perfect models comes as follows.
Recall the square root Kalman filter equations, but using the Newton-based recursion,
\[ \begin{align} \overline{\pmb{x}}_{k|k} &= \overline{\pmb{x}}_{k|k-1} + \boldsymbol{\Sigma}_{k|k-1}\overline{\pmb{w}}_k\\ \boldsymbol{\Sigma}_{k|k} &= \boldsymbol{\Sigma}_{k|k-1}\mathbf{T}_k \mathbf{U}\\ \mathbf{T}_k&:= \mathbf{H}_{\mathcal{J}}^{-\frac{1}{2}} \end{align} \] where \( \mathbf{H}_{\mathcal{J}} \) is the Hessian of the weight-space objective function.
It follows readily from a similar analysis that one can write,
\[ \begin{align} \overline{\pmb{x}}_{k|L}& = \pmb{x}_{k|L-1} + \boldsymbol{\Sigma}_{L|L-1} \overline{\pmb{w}}_L,\\ \mathbf{B}_{k|L} &= \mathbf{B}_{k|L-1} \mathbf{T}_L \mathbf{U}, \end{align} \]
so that when using the square root Kalman filter, one can simply make a backward pass over the mean and covariance of past states with the filter weights and right transform analysis to condition the joint posterior.
One may also consider the joint posterior directly, giving the statistical analogue of the optimization cost function.
In particular, recursively applying the Markov assumption and independence assumptions, we can write
\[ \begin{align} p(\pmb{x}_{L:0} \vert \pmb{y}_{L:1})& \propto \left[ \prod_{k=1}^L p(\pmb{y}_k \vert \pmb{x}_k ) \right]\left[\prod_{k=1}^{L} p(\pmb{x}_k \vert \pmb{x}_{k-1})\right]p\left(\pmb{x}_0\right). \end{align} \]
where in the above the joint posterior is proportional to
In the case without model error, we obtain the traditional extended objective function from the variational approach,
\[ \begin{align} \mathcal{J}(\pmb{x}) &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\pmb{x}_k \parallel_{\mathbf{R}_k}^2\\ &= \frac{1}{2} \parallel \overline{\pmb{x}}_0 - \pmb{x}\parallel_{\mathbf{B}_0}^2 + \frac{1}{2} \sum_{k=1}^L \parallel \pmb{y}_k - \mathbf{H}_k\mathbf{M}_{k:1}\pmb{x} \parallel_{\mathbf{R}_k}^2, \end{align} \] where we optimize on the initial condition.
Using the square root Newton approach, we can again find a global solution.