Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
In the last lecture, we saw a general derivation of the Kalman filter equations for a discrete Gauss-Markov model.
We also have a number of guarantees of the optimality of the solution in the state estimation:
\[ \begin{align} p(\pmb{x}_k|\pmb{y}_{k:1}) = \int p(\pmb{x}_{k:0}|\pmb{y}_{k:1})\mathrm{d}\pmb{x}_{k-1:0} \end{align} \] having averaged out all the past states from the joint posterior in time.
When the error distributions are non-Gaussian, this still remains the BLUE, but may not parameterize the posterior nor be the maximum a posteriori estimate.
However, even if the governing mechanistic laws are linear, \( \mathbf{M}_k \), and the observation operator is linear \( \mathbf{H}_k \), with Gaussian error distributions
\[ \begin{align} \pmb{x}_0 \sim N(\overline{\pmb{x}}_0 , \mathbf{B}_0), & & \pmb{w}_k \sim N(\pmb{0}, \mathbf{Q}_k), & & \pmb{v}_k \sim N(\pmb{0}, \mathbf{R}_k), \end{align} \]
Two important related questions emerge then:
In the case that \( \mathbf{Q}_k \) and \( \mathbf{R}_k \) are known,
it turns out that the initialization of the prior covariance doesn't imperil the long-term performance, either in the sense of boundedness or stability.
When these parameters are unknown, a variety of techniques have been developed to estimate these parameters;
Additionally, we will consider the issue of a biased first prior and empirical means of handling this.
Recall the discrete Gauss-Markov model,
\[ \begin{align} \pmb{x}_k &= \mathbf{M}_k \pmb{x}_{k-1} + \pmb{w}_k, \\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_k + \pmb{y}_k. \end{align} \]
To introduce the fundamental boundedness / stability result of the linear Kalman filter, we need to introduce the following definitions.
The information matrix
For the model defined above, the time-varying information matrix is defined as, \[ \begin{align} \boldsymbol{\Phi}_{k:j} := \sum_{l=j}^k \mathbf{M}_{k:l}^{-\top} \mathbf{H}_l^\top \mathbf{R}_l^{-1} \mathbf{H}_l\mathbf{M}_{k:l}^{-1} \end{align} \]
The controllability matrix
For the model defined above, the time-varying controllability matrix is defined as, \[ \begin{align} \boldsymbol{\Upsilon}_{k:j}:= \sum_{l=j}^k \mathbf{M}_{k:l}\mathbf{Q}_l \mathbf{M}_{k:l}^\top \end{align} \]
Two key concepts about the observation model and the mechanistic dynamic model then determine the boundedness and stability properties of the filter.
In order to understand this, we need to introduce the partial ordering on symmetric, positive semi-definite matrices.
Partial ordering on symmetric, positive semi-definite matrices
Let \( \mathbf{A} \) and \( \mathbf{B} \) be symmetric, positive semi-definite matrices. Then we can declare \[ \begin{align} \mathbf{A} \leq \mathbf{B} \end{align} \] if and only if all of the eigenvalues of \( \mathbf{B} \) are greater than or equal to those of \( \mathbf{A} \).
The above ordering allows us to consider a variety of properties about the covariance of the estimator, including how we mean to bound the covariance.
Uniform complete observability / controllability
We say that the system is uniformly completely observable (respectively controllable) if and only if there exists constants \( 0 < a < b < \infty \) independent of \( k \), and some \( N\geq 1 \), for which if \( k \) is sufficiently large \[ \begin{align} a\mathbf{I} \leq \boldsymbol{\Phi}_{k, k-N} \leq b \mathbf{I} \\ a \mathbf{I} \leq \boldsymbol{\Upsilon}_{k, k- N} \leq b \mathbf{I} \end{align} \] for all such \( k \).
The previous uniform complete observability and controllability conditions respectively guarantee that:
The model error controllabilty condition thus describes a kind of memorlyless condition similar to ergodicity;
Put together, this gives the fundamental result of the classical Kalman filter,
Filter boundedness and stability
Let \( \mathbf{B}_0 > 0\mathbf{I} \) be any initialization of the prior covariance satisfying this lower bound on the partial ordering. There exists a constants \( 0 < a < b < \infty \) and universal sequence \( \overline{\mathbf{B}}_k \) for which, if \( \mathbf{B}_k \) is generated by the Kalman filtering equations with \( \mathbf{B}_0 \) as the initialization, \[ \begin{align} \parallel \mathbf{B}_k - \overline{\mathbf{B}}_k \parallel \rightarrow 0 \end{align} \] exponentially fast in \( k \), and \( a\mathbf{I} < \overline{\mathbf{B}}_k < b\mathbf{I} \) for all \( k \).
We should just remark that it is also possible to derive filter boundedness and stability results in the case where the system is sufficiently observed but is noiseless.
This type of system is sometimes denoted a “perfect model”, as the mechanistic process \( \mathbf{M}_k \) completely describes the evolution of the uncertain initial data.
This again is in relation to, e.g., an initial value problem with a linear system of ODEs, or with a nonlinear system of ODEs in the space of perturbations (the tangent space), when the tangent-linear model is sufficiently accurate.
Under a generic ergodicity assumption (that holds almost surely for the tangent-linear model);
all covariances converge to a universal sequence \( \overline{\mathbf{B}}_k \) which has a column span identical to the unstable and neutral covariant / backward Lyapunov vectors for the system.
This is to say that the system's predictive uncertainty is asymptotically low-rank, and the only non-zero variances are in directions of the dynamic instability of the mechanistic model sequence \( \mathbf{M}_k \).
This is a modern result that provides some additional extensions to the classical filter boundedness / stability analysis for systems defined by a “perfect model” as above.
Consider how we earlier defined the Kalman filter innovation and the Kalman filter residual, but let us replace the conditional mean with the conditional expectation which we will denote \( \hat{\pmb{x}}_{k|j} \).
\[ \begin{align} \pmb{\delta}_{k|k-1} &:= \pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k-1},\\ \pmb{\epsilon}_{k|k} &:= \pmb{x}_{k|k} - \hat{\pmb{x}}_{k|k}. \end{align} \]
In the above, we are considering the conditional mean as a conditional expectation, depending on the outcomes of \( \pmb{y}_{k:1} \).
Important properties about these variables are actually their orthogonality properties, and their independence properties, which we discuss as follows.
Properties of the innovations / residuals
The innovations and residuals defined above satisfy the following general properties of least-squares estimators: \[ \begin{align} \mathbb{E}\left[\pmb{\epsilon}_{k|k} \hat{\pmb{x}}_{k|k}^\top \right] &= \pmb{0} & & \mathbb{E}\left[\pmb{\delta}_{k|k-1} \pmb{\delta}_{j|j-1}^\top\right] = \delta_{k,j} \left( \mathbf{H}_k \mathbf{B}_{k|k-1}\mathbf{H}_k^\top + \mathbf{R}_k\right) \\ \mathbb{E}\left[\pmb{\epsilon}_{k|k} \pmb{y}_k^\top \right]&= \pmb{0} & & \mathbb{E}\left[\pmb{\delta}_{k|k} \pmb{\delta}_{j|j}^\top\right] = \delta_{k,j} \mathbf{R}_k \end{align} \] where \( \delta_{k,j} \) above is the Kronecker delta.
Particularly, the estimator and its error, and the error and the observations, are uncorrelated.
Moreover, the residuals are white-in-time, with the known non-zero covariance given above only for matching time indices.
The importance of the last properties is in the fact that it gives a criterion for the accurate specification of the error statistics in the algorithm.
Particularly, if we suppose that \( \mathbf{R}_k \) is time-invariant, or slowly varying, we can use the innovation statistics to estimate \( \mathbf{R}_k \).
For simplicity, suppose \( \mathbf{R}_k\equiv \mathbf{R} \) is constant;
\[ \begin{align} \hat{\mathbf{R}} := \frac{1}{L} \sum_{k=1}^L \left[\pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k} \right]\left[\pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k} \right]^\top \end{align} \] can be shown to be an unbiased estimator for \( \mathbf{R} \), though will be reduced rank when the number of lagged residuals \( L < N_y \).
A miss-match between this estimate and the specified \( \mathbf{R} \) used in the Kalman filter equations evidences an incorrectly specified \( \mathbf{R} \).
Alternatively, various techniques can then be used to specify the observation error covariance adaptively, such as expectation maximization using the above relationship.
As with the observation error covariance, we can similarly estimate the model error covariance in the case in which \( \mathbf{Q}_k \) is time-invariant or slowly varies in time.
For simplicity, suppose that \( \mathbf{Q}_k \equiv \mathbf{Q} \) fixed in time.
\[ \begin{align} \hat{\mathbf{Q} } := \frac{1}{L} \sum_{k=1}^L \left[\hat{\pmb{x}}_{k|k} - \mathbf{M}_k \hat{\pmb{x}}_{k-1|k-1} \right]\left[\hat{\pmb{x}}_{k|k} - \mathbf{M}_k \hat{\pmb{x}}_{k-1|k-1} \right]^\top \end{align} \] can be shown to be an unbiased estimator for \( \mathbf{Q} \).
As with the last estimator, \( \hat{\mathbf{Q}} \) will be reduced rank if the number of lagged states \( L < N_x \).
This similarly give a criterion to check if the model error is specified correctly in the simulations;
You may note that the variety of results we have given have relied on a critical assumption that the prior
\[ \begin{align} N(\overline{\pmb{x}}_0 ,\mathbf{B}_0) \end{align} \] is actually unbiased, i.e., \( \mathbb{E}\left[\pmb{x}_0\right] = \overline{\pmb{x}}_0 \).
This is actually a non-trivial criterion to satisfy, and it isn't easily dealt with in practice.
In principle, if we gather enough data, we may be able to find an unbiased estimate for the initialization of a simulation.
Unlike the general convergence of background covariances, biased priors aren't guaranteed generally to lose their initial bias, and may have long-term effects in the prediction cycle.
Various techniques are used in practice, including estimating the biases of predictions;
If we inflate our background uncertainty (increase the variances), we put less importance on our prior knowledge and the algorithm is more receptive to the data.
Particularly, this reflects the trade off in the optimal weights in the relative uncertainty of the observations and the background state.
As a general rule, it is better to over estimate our background uncertainty than to underestimate – the later can often lead to what is known as filter divergence in real problems.