Hidden Markov Models and the Bootstrap Particle Filter

11/25/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:

What is “data assimilation”?

Observation-Analysis-Forecast cycles

The Bayesian framework for data assimilation

Hidden Markov models

A simple computational method

The bootstrap particle filter

What is “data assimilation”?

Image of global atmospheric circulation.

Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)

Data assimilation is a science at the intersection of statistics, physics and applied mathematics.
Data assimilation is differentiated by the use of a physics-based process model for the time-evolution of the variables we wish to estimate.
We should think of this model being an imperfect representation of the evolution, with uncertainty.
On the other hand, suppose we have limited and innacurate real-world observations of related quantitites.
The goal of data assimilation:

we want to combine the data from:
1. real-world observations; and
2. a physics-based model
to produce an “optimal” estimate of the physical quantity, its related parameters and the uncertainty therein.

What is “data assimilation”? (continued)

Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)

Let’s suppose that we have a physics-based model for some process – prototypically we can think of the weather.
This model encapsulates the understanding of first physical principles for the equations of motion for various time-varying states.
This model can represent, e.g.,

the atmosphere in terms of:

temperature,
pressure and
humidity

over Reno and the western USA.
The evolution of these states is described by a system of PDEs, discretized spatially over three dimensions of the atmosphere and surface topography.

What is “data assimilation”? (continued)

Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)

Let’s suppose that these dynamical (time-varying) states can be written in a vector, \( \mathbf{x}_k \in \mathbb{R}^n \), where \( k \) corresponds to some time \( t_k \).
Abstractly, we will represent the time-evolution of these states with the nonlinear map \( \mathcal{M} \), \[ \mathbf{x}_k = \mathcal{M}_{k:k-1} \left(\mathbf{x}_{k-1}, \boldsymbol{\lambda}\right) + \boldsymbol{\eta}_k \] where

\( \mathbf{x}_{k-1} \) is the vector of states at an earlier time \( t_{k-1} \);
\( \boldsymbol{\lambda} \) is a vector of uncertain phyiscal parameters on which the evolution depends, e.g., energy loss due to friction of wind on the Sierra.
\( \boldsymbol{\eta}_k \) is an additive, stochastic noise term, representing errors in our model for the physical process.

The states \( \mathbf{x}_k \) are the values we wish to estimate, having a prior distribution on \( \left(\mathbf{x}_{k-1}, \boldsymbol{\lambda}\right) \) and knowledge of \( \mathcal{M}_{k:k-1} \) and of how \( \boldsymbol{\eta}_k \) are distributed.

At time \( t_{k-1} \), we will make a forecast for the distribution of \( \mathbf{x}_k \) with our prior knowledge, including the physics-based model.

For the rest of this lecture, we will restrict our consideration to the case that \( \boldsymbol{\lambda} \) is a known constant for simplicity.

What is “data assimilation”? (continued)

Courtesy of: Kaidor via Wikimedia Commons (CC 3.0)

We suppose that we are also given real-world observations \( \mathbf{y}_k\in\mathbb{R}^d \) related to the phyiscal states by, \[ \mathbf{y}_k = \mathcal{H}_k \left(\mathbf{x}_k\right) + \boldsymbol{\epsilon}_k \] where

\( \mathcal{H}_k:\mathbb{R}^n \rightarrow \mathbb{R}^d \) is a nonlinear map relating the states we wish to estimate \( \mathbf{x}_k \) to the values that are observed \( \mathbf{y}_k \);

e.g., we may wish to estimate humidity, but we only observe the back-scatter of light from a satelite’s laser as it hits particulate in the atmosphere.
Typically \( d \ll n \) so this is information is sparse and observations are not \( 1:1 \) with the unobserved states.

\( \boldsymbol{\epsilon}_k \) is an additive, stochastic noise term representing errors in the measurements.

Therefore, at time \( t_k \) we will have a forecast distribution for the states \( \mathbf{x}_k \) generated by our prior on \( \mathbf{x}_{k-1} \) and our physics-based model \( \mathcal{M} \);

We will also have an observation \( \mathbf{y}_k \) with uncertainty.
We wish to find a posterior distribution for \( \mathbf{x}_k \) conditioned on \( \mathbf{y}_k \).