# A Bayesian approach to data assimilation

### Colin Grudzien

#### Assistant Professor of Statistics

Presented at the Joint ICTP-IUGG Workshop on Data Assimilation and Inverse Problems in Geophysical Sciences

19 October, 2021

## Outline

• The following topics will be covered in this lecture
• Observation-analysis-forecast cycles
• Hidden Markov models and Bayesian analysis
• Bayesian maximum-a-posteriori (MAP) analysis in Gaussian models
• The ensemble Kalman filter and smoother (EnKF / EnKS)
• The 4D cost function and VAR approaches
• Sequential EnVAR estimators in the EnKF analysis:
• the iterative ensemble Kalman smoother (IEnKS)
• the single-iteration ensemble Kalman smoother (SIEnKS)

## Motivation

• In applications such as short-range weather prediction, data assimilation provides a means to sequentially and recursively update forecasts with newly incoming information.
• The Bayesian approach to DA has become widely adopted because it provides a unified treatment of the tools from statistical estimation, nonlinear optimization and machine learning for handling such a problem.
• Let’s suppose that the dynamical physical states can be written in a vector, $$\pmb{x}_k \in \mathbb{R}^{N_x}$$, where $$k$$ corresponds to some time $$t_k$$.
• Abstractly, we will represent the time-evolution of these states with the nonlinear map $$\mathcal{M}$$, \begin{align}\\ \pmb{x}_k = \mathcal{M}_{k} \left(\pmb{x}_{k-1}, \boldsymbol{\lambda}\right) + \boldsymbol{\eta}_k \end{align} where
• $$\pmb{x}_{k-1}$$ is the vector of physical states at an earlier time $$t_{k-1}$$;
• $$\boldsymbol{\lambda}$$ is a vector of uncertain physical parameters on which the time evolution depends;
• $$\boldsymbol{\eta}_k$$ is an additive, stochastic noise term, representing errors in our model for the physical process.
• We will estimate the random vector $${\color{#d95f02} {\pmb{x}_k } }$$ with a prior distribution on $$\left(\pmb{x}_{k-1}, \boldsymbol{\lambda}\right)$$ and knowledge of $$\mathcal{M}_{k}$$ and knowledge of how $$\boldsymbol{\eta}_k$$ is distributed.
• At time $$t_{k-1}$$, we will make a forecast for the distribution of $$\pmb{x}_k$$ with our prior knowledge, including the physics-based model.
• For the rest of this lecture, we will restrict to the case that $$\boldsymbol{\lambda}$$ is a known constant, and the forecast model is perfect, \begin{align} \pmb{x}_k = \mathcal{M}_{k} \left(\pmb{x}_{k-1}\right), \end{align} for simplicity.

## Motivation

• Suppose that we are also given real-world observations $${\color{#7570b3} {\pmb{y}_k\in\mathbb{R}^{N_y}} }$$ related to the physical states by, ${\color{#7570b3} { \pmb{y}_k = \mathcal{H}_k \left(\pmb{x}_k\right) + \boldsymbol{\epsilon}_k } }$ where
• $${\color{#7570b3} {\mathcal{H}_k:\mathbb{R}^{N_x} \rightarrow \mathbb{R}^{N_y} } }$$ is a nonlinear map relating the states we wish to estimate $$\pmb{x}_k$$ to the values that are observed $${\color{#7570b3} { \pmb{y}_k} }$$.
• Typically $$N_y \ll N_x$$ so this is information is sparse and observations are not $$1:1$$ with the unobserved states.
• $$\boldsymbol{\epsilon}_k$$ is an additive, stochastic noise term representing errors in the measurements.
• Therefore, at time $$t_k$$ we will have a forecast distribution for the states $$\pmb{x}_k$$ generated by our prior on $$\pmb{x}_{k-1}$$ and our physics-based model $$\mathcal{M}$$.
• We will also have an observation $$\pmb{y}_k$$ with uncertainty.
• We wish to find a posterior distribution for $$\pmb{x}_k$$ conditioned on $$\pmb{y}_k$$.