Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Observability and controllability
- Filter boundedness and stability
- Innovation and residual statistics
- Estimating \( \mathbf{R}_k \)
- Estimating \( \mathbf{Q}_k \)
- Biased priors

In the last lecture, we saw a general derivation of the Kalman filter equations for a discrete Gauss-Markov model.

- This includes both the classic approach, and the more
**numerically stable****square root covariance update equations**.

- This includes both the classic approach, and the more
We also have a number of

**guarantees of the optimality**of the solution in the state estimation:- for a
**linear-Gaussian system**, the**conditional mean is the minimum variance linear unbiased estimator**; and - is the
**maximum a posteriori estimate**; and - the mean and covariance
**parameterize the Bayesian marginal posterior**directly, knowing that this is a Gaussian, derived as

\[ \begin{align} p(\pmb{x}_k|\pmb{y}_{k:1}) = \int p(\pmb{x}_{k:0}|\pmb{y}_{k:1})\mathrm{d}\pmb{x}_{k-1:0} \end{align} \] having averaged out all the past states from the joint posterior in time.

- for a
When the error distributions are

**non-Gaussian**, this still**remains the BLUE**, but may**not parameterize the posterior nor be the maximum a posteriori estimate**.However, even if the governing mechanistic laws are linear, \( \mathbf{M}_k \), and the observation operator is linear \( \mathbf{H}_k \), with Gaussian error distributions

\[ \begin{align} \pmb{x}_0 \sim N(\overline{\pmb{x}}_0 , \mathbf{B}_0), & & \pmb{w}_k \sim N(\pmb{0}, \mathbf{Q}_k), & & \pmb{v}_k \sim N(\pmb{0}, \mathbf{R}_k), \end{align} \]

- we
**generally do not actually know any of the above parameters**\( \overline{\pmb{x}}_0, \mathbf{B}_0, \mathbf{Q}_k,\mathbf{R}_k \) in practice…

- we

Two important

**related questions**emerge then:- The question of how do we guarantee that the background error covariance \( \mathbf{B}_k \)
**does not grow to infinite variances is known as filter boundedness**. - The question of how do we guarantee “optimal” performance of a linear Kalman filter with
**uncertain parameters is known as filter stability**.

- The question of how do we guarantee that the background error covariance \( \mathbf{B}_k \)
In the case that \( \mathbf{Q}_k \) and \( \mathbf{R}_k \) are known,

- and they satisfy
**“observability”**and**“controlability”**conditions,

- and they satisfy
it turns out that the

**initialization of the prior covariance doesn't imperil the long-term performance**, either in the sense of boundedness or stability.When these parameters are unknown, a variety of techniques have been developed to estimate these parameters;

- we will consider some
**classical results**based on**“innovation” and “residual” statistics**, though more modern approaches may consider, e.g., Bayesian hierarchical models.

- we will consider some
Additionally, we will consider the issue of a

**biased first prior**and empirical means of handling this.

Recall the discrete Gauss-Markov model,

\[ \begin{align} \pmb{x}_k &= \mathbf{M}_k \pmb{x}_{k-1} + \pmb{w}_k, \\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_k + \pmb{y}_k. \end{align} \]

To introduce the

**fundamental boundedness / stability result**of the linear Kalman filter, we need to introduce the following definitions.

The information matrix

For the model defined above, the time-varyinginformation matrixis defined as, \[ \begin{align} \boldsymbol{\Phi}_{k:j} := \sum_{l=j}^k \mathbf{M}_{k:l}^{-\top} \mathbf{H}_l^\top \mathbf{R}_l^{-1} \mathbf{H}_l\mathbf{M}_{k:l}^{-1} \end{align} \]

- The information matrix above can be considered to be a representation of
**how much information is transmitted backward-in-time**from time \( t_k \) to time \( t_l \) through the observations over this window.

The controllability matrix

For the model defined above, the time-varyingcontrollability matrixis defined as, \[ \begin{align} \boldsymbol{\Upsilon}_{k:j}:= \sum_{l=j}^k \mathbf{M}_{k:l}\mathbf{Q}_l \mathbf{M}_{k:l}^\top \end{align} \]

- The controllability matrix above respectively represents how an
**arbitrary initial state can be driven to another state by the sequence of noise realizations**combined with the mechanistic laws.

Two key concepts about the observation model and the mechanistic dynamic model then determine the boundedness and stability properties of the filter.

In order to understand this, we need to introduce the partial ordering on symmetric, positive semi-definite matrices.

Partial ordering on symmetric, positive semi-definite matrices

Let \( \mathbf{A} \) and \( \mathbf{B} \) be symmetric, positive semi-definite matrices. Then we can declare \[ \begin{align} \mathbf{A} \leq \mathbf{B} \end{align} \] if and only if all of the eigenvalues of \( \mathbf{B} \) are greater than or equal to those of \( \mathbf{A} \).

The above ordering allows us to consider a variety of properties about the covariance of the estimator, including

**how we mean to bound the covariance**.- Similarly, this allows us to place lower and upper bounds on the information and controllability matrices.

Uniform complete observability / controllability

We say that the system isuniformly completely observable (respectively controllable)if and only if there exists constants \( 0 < a < b < \infty \) independent of \( k \), and some \( N\geq 1 \), for which if \( k \) is sufficiently large \[ \begin{align} a\mathbf{I} \leq \boldsymbol{\Phi}_{k, k-N} \leq b \mathbf{I} \\ a \mathbf{I} \leq \boldsymbol{\Upsilon}_{k, k- N} \leq b \mathbf{I} \end{align} \] for all such \( k \).

The previous

**uniform complete observability and controllability**conditions respectively guarantee that:- given finitely many observations, the
**initial state of the system**(\( N \) steps back in time)**can be reconstructed from this information as a linear combination**; - respectively, the controllability condition describes the
ability to move the system from any initial state to a desired state given a finite sequence of control actions—in our case the moves are the realizations of model error.

- given finitely many observations, the
The model error controllabilty condition thus describes a kind of

**memorlyless condition similar to ergodicity**;- particularly,
**no state of the system remains completely time-invariant with respect to the dynamics**, and the model is free to explore the entire state space.

- particularly,
Put together, this gives the

**fundamental result of the classical Kalman filter**,

Filter boundedness and stability

Let \( \mathbf{B}_0 > 0\mathbf{I} \) be any initialization of the prior covariance satisfying this lower bound on the partial ordering. There exists a constants \( 0 < a < b < \infty \) anduniversal sequence\( \overline{\mathbf{B}}_k \) for which, if \( \mathbf{B}_k \) is generated by the Kalman filtering equations with \( \mathbf{B}_0 \) as the initialization, \[ \begin{align} \parallel \mathbf{B}_k - \overline{\mathbf{B}}_k \parallel \rightarrow 0 \end{align} \] exponentially fast in \( k \), and \( a\mathbf{I} < \overline{\mathbf{B}}_k < b\mathbf{I} \) for all \( k \).

- The above means that even for
**any first prior covariance**(background uncertainty), the**system exponentially forgets about the prior and reaches a unique, bounded variance, optimal sequence of posterior estimates**.

We should just remark that it is also possible to derive

**filter boundedness and stability**results in the case where the**system is sufficiently observed but is noiseless**.This type of system is sometimes denoted a

**“perfect model”**, as the**mechanistic process**\( \mathbf{M}_k \)**completely describes the evolution of the uncertain initial data**.This again is in relation to, e.g., an initial value problem with a linear system of ODEs, or with a nonlinear system of ODEs in the space of perturbations (the tangent space), when the tangent-linear model is sufficiently accurate.

Under a

**generic ergodicity assumption**(that holds almost surely for the tangent-linear model);- and an
**assumption of the uniform complete observability**of the system's dynamical instabilities; - with a
**sufficient rank of the initial covariance**;

- and an
all covariances converge to a universal sequence \( \overline{\mathbf{B}}_k \) which has a

**column span identical to the unstable and neutral covariant / backward Lyapunov vectors**for the system.This is to say that the system's

**predictive uncertainty**is**asymptotically low-rank**, and the only**non-zero variances are in directions of the dynamic instability**of the mechanistic model sequence \( \mathbf{M}_k \).This is a modern result that provides some additional extensions to the classical filter boundedness / stability analysis for

**systems defined by a “perfect model” as above**.

Consider how we earlier defined the Kalman filter innovation and the Kalman filter residual, but let us replace the conditional mean with the

**conditional expectation**which we will denote \( \hat{\pmb{x}}_{k|j} \).\[ \begin{align} \pmb{\delta}_{k|k-1} &:= \pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k-1},\\ \pmb{\epsilon}_{k|k} &:= \pmb{x}_{k|k} - \hat{\pmb{x}}_{k|k}. \end{align} \]

In the above, we are considering the

**conditional mean**as a**conditional expectation, depending on the outcomes of \( \pmb{y}_{k:1} \)**.Important properties about these variables are actually their orthogonality properties, and their independence properties, which we discuss as follows.

Properties of the innovations / residuals

The innovations and residuals defined above satisfy the following general properties of least-squares estimators: \[ \begin{align} \mathbb{E}\left[\pmb{\epsilon}_{k|k} \hat{\pmb{x}}_{k|k}^\top \right] &= \pmb{0} & & \mathbb{E}\left[\pmb{\delta}_{k|k-1} \pmb{\delta}_{j|j-1}^\top\right] = \delta_{k,j} \left( \mathbf{H}_k \mathbf{B}_{k|k-1}\mathbf{H}_k^\top + \mathbf{R}_k\right) \\ \mathbb{E}\left[\pmb{\epsilon}_{k|k} \pmb{y}_k^\top \right]&= \pmb{0} & & \mathbb{E}\left[\pmb{\delta}_{k|k} \pmb{\delta}_{j|j}^\top\right] = \delta_{k,j} \mathbf{R}_k \end{align} \] where \( \delta_{k,j} \) above is the Kronecker delta.

Particularly, the

**estimator and its error, and the error and the observations**, are**uncorrelated**.Moreover, the

**residuals are white-in-time**, with the**known non-zero covariance given above only for matching time indices**.

The importance of the last properties is in the fact that it gives a

**criterion**for the**accurate specification of the error statistics in the algorithm**.Particularly, if we suppose that \( \mathbf{R}_k \) is

**time-invariant, or slowly varying**, we can**use the innovation statistics to estimate**\( \mathbf{R}_k \).For simplicity, suppose \( \mathbf{R}_k\equiv \mathbf{R} \) is constant;

- then with an unbiased initial prior, supposing that the model is specified correctly, the model error is specified correctly and \( \mathbf{R} \) is specified correctly

\[ \begin{align} \hat{\mathbf{R}} := \frac{1}{L} \sum_{k=1}^L \left[\pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k} \right]\left[\pmb{y}_k - \mathbf{H}_k \hat{\pmb{x}}_{k|k} \right]^\top \end{align} \] can be shown to be an unbiased estimator for \( \mathbf{R} \), though will be reduced rank when the number of lagged residuals \( L < N_y \).

A

**miss-match between this estimate and the specified**\( \mathbf{R} \) used in the Kalman filter equations**evidences an incorrectly specified**\( \mathbf{R} \).- This can thus be used to tune \( \mathbf{R} \) to find a “correct” observation error covariance.

Alternatively, various techniques can then be used to

**specify the observation error covariance adaptively**, such as**expectation maximization**using the above relationship.

As with the observation error covariance, we can similarly estimate the

**model error covariance**in the case in which \( \mathbf{Q}_k \) is**time-invariant or slowly varies in time**.For simplicity, suppose that \( \mathbf{Q}_k \equiv \mathbf{Q} \) fixed in time.

- Similarly, with an unbiased initial prior, supposing that the model is specified correctly, the model error is specified correctly and \( \mathbf{R} \) is specified correctly

\[ \begin{align} \hat{\mathbf{Q} } := \frac{1}{L} \sum_{k=1}^L \left[\hat{\pmb{x}}_{k|k} - \mathbf{M}_k \hat{\pmb{x}}_{k-1|k-1} \right]\left[\hat{\pmb{x}}_{k|k} - \mathbf{M}_k \hat{\pmb{x}}_{k-1|k-1} \right]^\top \end{align} \] can be shown to be an unbiased estimator for \( \mathbf{Q} \).

As with the last estimator, \( \hat{\mathbf{Q}} \) will be reduced rank if the number of lagged states \( L < N_x \).

This similarly give a

**criterion**to check**if the model error is specified correctly in the simulations**;- alternatively,
**adaptive error estimation**is a rich area and has likewise been performed in classical settings with**expectation maximization**.

- alternatively,

You may note that the variety of results we have given have

**relied on a critical assumption that the prior**\[ \begin{align} N(\overline{\pmb{x}}_0 ,\mathbf{B}_0) \end{align} \] is

**actually unbiased**, i.e., \( \mathbb{E}\left[\pmb{x}_0\right] = \overline{\pmb{x}}_0 \).This is actually a

**non-trivial criterion to satisfy**, and it**isn't easily dealt with in practice**.In principle, if we gather enough data, we may be able to find an unbiased estimate for the initialization of a simulation.

- However, the reality of this is actually quite challenging, and we may in general initialize with a biased prior.

Unlike the general convergence of background covariances,

**biased priors aren't guaranteed generally to lose their initial bias**, and may have**long-term effects in the prediction cycle**.Various techniques are used in practice, including estimating the biases of predictions;

- we may also consider that, the effect of a biased prior is reduced by having a larger background uncertainty.

If we

**inflate our background uncertainty**(increase the variances), we put**less importance on our prior knowledge and the algorithm is more receptive to the data**.Particularly, this reflects the trade off in the optimal weights in the relative uncertainty of the observations and the background state.

As a general rule, it is

**better to over estimate our background uncertainty than to underestimate**– the later can often lead to what is known as**filter divergence**in real problems.