Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
We have now seen a variety of techniques for linear estimation, and some that bridge to nonlinear estimation such as Gauss-Newton.
The statistical and optimization approaches to DA share commonality in nonlinear estimation as well,
After we have developed the fundamentals of their nonlinear application, we will consider how these can be hybridzed in practice.
To begin with a bridge to the nonlinear estimation problem, we will consider when we want to estimate a process model parameter simultaneously with the unknown state \( \pmb{x} \).
Particularly, we will suppose that we have a hidden Markov model of the form,
\[ \begin{align} \pmb{x}_{k} &= \mathbf{M}_{\pmb{\lambda},k} \pmb{x}_{k-1} + \pmb{w}_k\\ \pmb{y}_{k} &= \mathbf{H}_k\pmb{x}_{k} + \pmb{v}_k \end{align} \]
where \( \pmb{\lambda} \in \mathbb{R}^{N_p} \) is a vector of parameters that the model evolution depends on;
However, if \( \pmb{\lambda} \) represents a vector of free variables for the system,
the issue arises of how these parameters can or cannot be estimated simultaneously with the evolving state \( \pmb{x}_k \).
In particular, if we were to suppose that there are some true values for which process evolution depends on \( \pmb{\lambda}^\ast \), where
\[ \begin{align} \pmb{x}^\mathrm{true}_k &= \mathbf{M}_{\pmb{\lambda}^\ast, k} \pmb{x}_{k-1}^\mathrm{true} + \pmb{w}_k, \\ \pmb{y}_k &= \mathbf{H}_k \pmb{x}_{k}^\mathrm{true} + \pmb{v}_k, \end{align} \]
our simulation with respect to a different choice of parameters \( \pmb{\lambda} \) will generically produce a biased forecast.
In this case, we can frame the joint estimation in a view similar to our discussion of nonlinear least-squares.
It is both a classical dynamical systems trick / Bayesian trick to turn the analysis of joint estimation into an “extended state”.
By extended state, we say suppose we have the parameter-dependent hidden Markov Model,
\[ \begin{align} \pmb{x}_{k} &= \mathbf{M}_{\pmb{\lambda},k} \pmb{x}_{k-1} + \pmb{w}_k;\\ \pmb{y}_{k} &= \mathbf{H}_k\pmb{x}_{k} + \pmb{v}_k; \end{align} \]
then, let us write the extended state vector, and noise vector, as
\[ \begin{align} \tilde{\pmb{x}}:= \begin{pmatrix} \pmb{x} \\ \pmb{\lambda} \end{pmatrix}, & & \tilde{\pmb{w}}_k := \begin{pmatrix} \pmb{w}_k \\ \pmb{0} \end{pmatrix}, & & \tilde{\pmb{w}}_k \sim N\left(\pmb{0},\begin{pmatrix}\mathbf{Q}_k & \pmb{0} \\ \pmb{0} & \pmb{0} \end{pmatrix} \right). \end{align} \]
Similarly, let us define the following mechanistic model, model noise and observation operator
\[ \begin{align} \widetilde{\mathcal{M}}_k &:= \begin{pmatrix}\mathbf{M}_{\pmb{\lambda},k} & \pmb{0} \\ \pmb{0} & \mathbf{I}_{N_p} \end{pmatrix} & & \widetilde{\mathbf{H}}_k := \begin{pmatrix} \mathbf{H}_k & \pmb{0} \\ \pmb{0} & \pmb{0} \end{pmatrix}. \end{align} \]
Then an extended state model is defined by
\[ \begin{align} \tilde{\pmb{x}}_{k} &= \widetilde{\mathcal{M}}_{k} \left(\tilde{\pmb{x}}_{k-1}\right) + \tilde{\pmb{w}}_k;\\ \pmb{y}_{k} &= \widetilde{\mathbf{H}}_k\tilde{\pmb{x}}_{k} + \pmb{v}_k; \end{align} \] where \( \widetilde{\mathcal{M}}_k \) is actually a nonlinear function of the extended state.
From the extended state model on the last slide, we now have a formal version of a hidden Markov model more generally.
\[ \begin{align} \tilde{\pmb{x}}_0 \sim N\left(\begin{pmatrix}\overline{\pmb{x}}_0 \\ \overline{\pmb{\lambda}}_0 \end{pmatrix},\begin{pmatrix}\mathbf{B}_{\pmb{x}} & \mathbf{B}_{\pmb{x},\pmb{\lambda}}\\ \mathbf{B}_{\pmb{\lambda},\pmb{x}} & \mathbf{B}_{\pmb{\lambda}} \end{pmatrix} \right) \equiv N\left( \overline{\tilde{\pmb{x}}}, \widetilde{\mathbf{B}}_0\right); \end{align} \]
however, even with a Gaussian prior, the subsequent evolution will be non-Gaussian generically due to the lack of closure under a nonlinear evolution.
Let's suppose for the moment that we have prior cross covariances between the state and the parameter values, \( \mathbf{B}_{\pmb{x},\pmb{\lambda}} \).
In this case, an iterative approach in terms of the state one-step-back-in-time can be relatively simply formulated as an optimization problem,
\[ \begin{align} \mathcal{J}(\tilde{\pmb{x}}) := \frac{1}{2} \parallel \overline{\tilde{\pmb{x}}}_0 - \tilde{\pmb{x}} \parallel^2_{\widetilde{\mathbf{B}}_0} + \frac{1}{2} \parallel \pmb{y}_1 - \widetilde{\mathbf{H}}_1 \widetilde{\mathcal{M}}_{1}\left( \tilde{\pmb{x}}\right) \parallel^2_{\mathbf{R}_1} . \end{align} \]
That is to say, we wish to minimize the discrepancy between the observed data and the output from input parameters in the extended state,
where this is now a nonlinear least-squares optimization, which is not necessarily convex.
As was discussed with nonlinear least-squares, this type of estimation is often approached via Gauss-Newton, due to the efficient approximation of the Hessian.
Particularly, this requires a calculation of the derivative of \( \nabla_{\pmb{\lambda}} \mathbf{M}_{\pmb{\lambda},k} \);
A Gauss-Newton estimation, therefore, iteratively minimizes the cost with the approximate Newton procedure
\[ \begin{align} \tilde{\pmb{x}}_{0}^i := \tilde{\pmb{x}}_0^{i-1} - \widetilde{\mathbf{H}}_{\mathcal{J}}^{-1} \nabla_{\tilde{\pmb{x}}} \mathcal{J}|_{\tilde{\pmb{x}}^{i-1}}, \end{align} \] where:
Such an optimization can be run until the discrepancy between the output and the data reaches a tolerated error level.
To guarantee a descent, line-search methods modify this technique to efficiently look for a correct step size within the approximate Newton direction.
This has a parallel development in what is known as a “trust region” optimization, which can be derived in a slightly different classical optimization scheme, Levenberg-Marquardt.
One of the issues about the direct optimization approach seen for the extended state is the susceptibility to local minima for the parameter value \( \pmb{\lambda} \).
In particular, nonlinear cost functions for joint state-parameter estimation tend to be highly sensitive, due to the many possible parameter values that could produce different behaviors in the time-evolution process.
One commonly used rectification for the local minima is to instead view \( \pmb{\lambda} \) as a time-evolving random state, governed by, e.g., a random walk in parameter space,
\[ \begin{align} \tilde{\pmb{x}}:= \begin{pmatrix} \pmb{x} \\ \pmb{\lambda} \end{pmatrix}, & & \tilde{\pmb{w}}_k := \begin{pmatrix} \pmb{w}_k \\ \pmb{p}_k \end{pmatrix}, & & \tilde{\pmb{w}}_k \sim N\left(\pmb{0},\begin{pmatrix}\mathbf{Q}_k & \pmb{0} \\ \pmb{0} & \mathbf{Q}_{\pmb{p}} \end{pmatrix} \right), \\ \tilde{\pmb{x}}_{k} = \widetilde{\mathcal{M}}_{k} \left(\tilde{\pmb{x}}_{k-1}\right) + \tilde{\pmb{w}}_k, & & \pmb{y}_{k} = \widetilde{\mathbf{H}}_k\tilde{\pmb{x}}_{k} + \pmb{v}_k. \end{align} \]
This simple adjustment then allows one to explore the parameter space with a noise model;
However, the jittering may also knock the estimate away from the global minima;
The rule-of-thumb is to try such an estimator over multiple walk settings and see for what noise levels one gets consistent results in replication.
When estimating process parameters as above, the quality of the estimates also depends strongly on the level of correlation between the parameter values and the model state evolution.
If the time-evolution of the states is strongly connected to the choice of parameter settings, the parameters can be easily identified by the cross correlation between the observed data and the simulation outputs.
However, these correlations are poorly known in most cases, and specifying \( \mathbf{B}_{\pmb{x},\pmb{\lambda}} \) can rarely be done accurately.
Suppose we do not have any knowledge about the cross covariances, but we do have prior knowledge of their respective backgrounds.
If we applied a Kalman-like estimator directly without specified cross correlation \( \mathbf{B}_{\pmb{x},\pmb{\lambda}} = \pmb{0} \),
However, these correlations can often be constructed empirically with a sample-based approach from Bayesian analysis…
Instead, we can take a random draw from the prior on the extended state,
\[ \begin{align} \tilde{\pmb{x}}_0^j \sim N\left(\begin{pmatrix}\overline{\pmb{x}}_0 \\ \overline{\pmb{\lambda}}_0 \end{pmatrix},\begin{pmatrix}\mathbf{B}_{\pmb{x}} &\pmb{0}\\ \pmb{0} & \mathbf{B}_{\pmb{\lambda}} \end{pmatrix} \right) & & j=1,\cdots, N_e. \end{align} \]
Although the sample will lack correlation between the two components, by this initialization, the evolution through the process builds correlations between the states and parameters, i.e., let
\[ \begin{align} \tilde{\pmb{x}}^j_1:= \begin{pmatrix} \mathbf{M}_{\pmb{\lambda}^j,1} \pmb{x}_0^j + \pmb{w}_k^j \\ \pmb{\lambda}^j_0 + \pmb{p}^j_1\end{pmatrix} \end{align} \] where:
Let us denote the ensemble matrix once again, with replicates of the extended state forming the columns
\[ \begin{align} \mathbf{E}_1 := \begin{pmatrix} \tilde{\pmb{x}}_1^1 & \cdots & \tilde{\pmb{x}}_1^{N_e}\end{pmatrix}, \end{align} \] and respectively for the anomaly matrix,
\[ \begin{align} \mathbf{X}_1 := \begin{pmatrix} \tilde{\pmb{x}}_1^1 - \hat{\tilde{\pmb{x}}}_1 & \cdots & \tilde{\pmb{x}}_1^{N_e} -\hat{\tilde{\pmb{x}}}_1 \end{pmatrix}/\sqrt{N_e -1} \end{align} \]
Generically, the ensemble covariance, from the replicates
\[ \begin{align} \mathbf{P}_1 := \mathbf{X}_1 \mathbf{X}^\top_1 = \begin{pmatrix} \mathbf{P}_{\pmb{x}} & \mathbf{P}_{\pmb{x},\pmb{\lambda}} \\ \mathbf{P}_{\pmb{\lambda},\pmb{x}} & \mathbf{P}_{\pmb{\lambda}}\end{pmatrix}, \end{align} \] will not have zero cross covariances.
However, we have yet-to-discuss, how do we use future information like this to condition estimates of past states.
This is strongly related to the extended-in-time cost function in optimization,
Using future information to update the estimate of a past state is what is broadly known in stochastic filtering as the smoothing problem.
The differences between the filtering, smoothing and sequential smoothing problems will be discussed in the next lecture.