Continuous-time models and stochastic calculus Part II

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Ordinary differential equations
- Stochastic differential equations
- Additive noise
- Modes of convergence
- The Fokker-Planck Equations

Ordinary differential equations

In our last session, we introduced the notion of the stochastic integral, with two standard forms, the Itô and Stratonovich forms.
Particularly, we discussed some of the ways that the stochastic integral extends, and is different from the standard deterministic integral.
Despite the differences, we are able to formally manipulate these equations with e.g., Itô's lemmas.
In particular, these concepts allow us to derive what is known as a stochastic differential equation as an extension of the ordinary differential equation.
Giving an intuition on this extension, and how we will use this formalism to sample a target density, will be the focus of this lecture.

Ordinary differential equations

A general ordinary differential equation (ODE) is written as

\[ \begin{align} & \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} := \pmb{f}(t, \pmb{x}) \\ \Leftrightarrow & \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]
When \( \pmb{f} \) satisfies a regularity condition, this equation will have a unique solution given some initial data.

Lipshitz Continuity
The function \( \pmb{f}:\mathbb{R}^{N_x} \rightarrow \mathbb{R}^{N_x} \) is said to be Lipshitz continuous at a point \( \pmb{x}_0 \) if for all \( \pmb{x}_1 \), in a sufficiently small neighborhood of \( \pmb{x}_0 \), \[ \begin{align} \parallel \pmb{f}(\pmb{x}_0) - \pmb{f}(\pmb{x}_1) \parallel \leq K \parallel \pmb{x}_0 - \pmb{x}_1\parallel \end{align} \] for a fixed constant \( K\in \mathbb{R} \).

Lipshitz continuity above is stronger than regular continuity, but weaker than differentiability.
- In particular, if \( \pmb{f}\in \mathcal{C}^1(\mathbb{R}^{N_x}) \), \( \pmb{f} \) satisfies Lipshitz continuity.
- More generally, a function that is Lipshitz continuous can be shown to be differentiable except on a set of measure zero,
- i.e., with probability one, you will select a point in a bounded interval at which \( \pmb{f} \) is differentiable.
For this reason, we consider Lipshitz functions to be differentiable “almost everywhere”, where the number of non-differentiable spikes is limited.

Ordinary differential equations

Recall the differential equation on the last slide

\[ \begin{align} \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]
Provided that \( \pmb{f} \) is Lipshitz in its components at some initial condition, in the state variable and time, it can be shown that there is a unique solution defined for this initial data.

Picard-Lindelölf theorem
Suppose that \( \pmb{f} \) satisfies the Lipshitz condition at a point \( (0,\pmb{x}_0) \) as previously discussed. Then there is a unique solution \( \pmb{x}(t) \) defined on some time interval \( [ -\epsilon, \epsilon] \) for which:

\( \pmb{x}(0)= \pmb{x}_0 \),

\( \frac{\mathrm{d}}{\mathrm{d}t}|_{t=t_0}\pmb{x} = \pmb{f}(t_0, \pmb{x}(t_0)) \), and

where we formally write \[ \begin{align} \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]

Notice that this only defines a solution within a local neighborhood, depending on a range of time around the initial condition.
This known as an initial value problem, as previously discussed in the context of Markov models.

Ordinary differential equations

Particularly, suppose that we have an initial prior on the state vector \( \pmb{x}_0 \) and \( \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x}=\pmb{f} \) is known to satisfy the Lipshitz condition in the support of \( p(\pmb{x}_0) \).
This actually defines a deterministic Markov model, but where there is uncertainty in the initial value.
We can define the discrete mapping under the continuous time model by the flow map discussed before, where

\[ \begin{align} \boldsymbol{\Phi}(t, \pmb{x}_0) = \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]
In the case that \( \pmb{f} \) is a linear transformation, \( \boldsymbol{\Phi}\equiv \mathbf{M}_t \) for some matrix, as with the previously defined Gauss-Markov model.
In this case, we once again generate a transition kernel as

\[ \begin{align} \mathcal{P}\left(\pmb{x}_t | \pmb{x}_0\right) = \delta_{\mathbf{M}_t \pmb{x}_0} \end{align} \] with \( \delta_{\pmb{v}} \) referring to the Dirac measure at \( \pmb{v} \in \mathbb{R}^{N_x} \).
The Dirac probability measure is defined by the property,

\[ \begin{align} \int f(x) \boldsymbol{\delta}_{\pmb{v}}\left(\mathrm{d}\pmb{x}\right) = f\left(\pmb{v}\right); \end{align} \] particularly, the Dirac delta is a singular measure, understood by the integral equation.

Ordinary differential equations

Similarly, we will say that the transition “density” is given in terms of \[ \begin{align} p(\pmb{x}_t \vert \pmb{x}_{0} ) \equiv \delta \left\{\pmb{x}_t - \mathbf{M}_t\left(\pmb{x}_{0}\right)\right\} \end{align} \] where \( \delta \) represents the Dirac distribution.
Heuristically, this is known as the “function” which has the property \[ \pmb{\delta}(\pmb{x}) = \begin{cases} +\infty & \pmb{x} = \pmb{0} \\ 0 & \text{else}\end{cases}; \]
This is just a convenient abuse of notations, as the Dirac measure does not have a density with respect to the standard Lebesgue measure.
Rather, the Dirac distribution is understood through the generalized function theory of distributions as a type of kernel that gives the property, \[ \begin{align} \int f(\pmb{x}_{t}) \delta\left\{\pmb{x}_t- \mathbf{M}_t\left(\pmb{x}_{0}\right)\right\}\mathrm{d}\pmb{x}_{t} = f\left(\mathbf{M}_t\left(\pmb{x}_{0}\right)\right). \end{align} \]
This equation is to be interpreted that, given a realization of the initial condition \( \pmb{x}_0 \sim P \), this defines the probability one of the subsequent realizations \( \mathbf{M}_t \pmb{x}_0 \) at all times \( t \).
Therefore, such a model is known as a “perfect” model, as it totally determines the subsequent evolution of the random process in time.
However, our classic Gauss-Markov model was defined in terms of a perfect model that is perturbed by random shocks, \[ \begin{align} \pmb{x}_{k}:= \mathbf{M}_k \pmb{x}_{k-1} + \pmb{w}_k. \end{align} \]
The extension of such a Gauss-Markov model to a system generated with continuous time is derived with the notion of a stochastic differential equation.

Stochastic differential equations

A general, scalar stochastic differential equation (SDE) is written as

\[ \begin{align} \mathrm{d}X_t := a(t, X_t)\mathrm{d}t + b(t, X_t)\mathrm{d}W_t \end{align} \] where \( a \) is known as the drift function and \( b \) is known as the diffusion function.
The above SDE is written in the Itô form, while there exists an equivalent Stratonovich form given as,

\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t, X_t) \partial_x b(t, X_t)\right] \mathrm{d}t + b(t, X_t) \circ \mathrm{d}W_t. \end{align} \]
The Itô SDE has a formal solution given by

\[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t \end{align} \]
An immediate difference from the ODE initial value problem is that the evolution of the state is given by a random variable, with a distribution that depends in time on the realization of the Wiener process.
- Particularly, the transition probability is no longer given by a Dirac measure, and instead include uncertainty in the evolution.
In this case, the drift terms represent the mechanistic laws governing the process, while the diffusion terms represent the random shocks to the system.
If the drift and diffusion \( a,b \) are linear functions, this furthermore defines a Gauss-Markov model.

Additive noise

We will note here a particular scenario that is of special relevance to our discussions.
When the diffusion term \( b \) has no dependence on the model state \( X_t \), then the model is said to be one of additive noise.
Recall then the Stratonovich SDE

\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t) \partial_x b(t)\right] \mathrm{d}t + b(t ) \circ \mathrm{d}W_t. \end{align} \]

In particular, \( \partial_x b \equiv 0 \) when \( b \) is only a function of time, i.e.,

\[ \begin{align} \mathrm{d}X_t := a(t, X_t) \mathrm{d}t + b(t) \circ \mathrm{d}W_t. \end{align} \]
- Therefore, it can be shown that the Stratonovich SDE and the Itô SDE are the same for additive noise.
This is a scenario that is frequently studied in data assimilation literature, because of the simplification of the SDE above, and for the way this represents precisely unbiased shocks to governing process laws.
We will return to such systems when we look at numerical solutions shortly.

Modes of convergence

As noted with stochastic processes, and stochastic calculus, there are multiple ways we might consider the existence and uniqueness of a solution to an SDE.

Strong convergence
A strong solution \( X_t \) of an Itô SDE (or equivalent Stratonovich SDE) has the following properties:

\( X_T \) satisfies \[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t, \end{align} \] and for all times \( T \), \( X_T \) is a function of \( a,b \) and the realization of \( W_t \) for all times \( t<T \); and

the integrals in the above are well-defined in terms of the proper modes of convergence.

The important notion here is that if we change the realization of the Wiener process \( W_t \), then also the strong solution \( X_t \) changes, but the functional relation between \( X_t \) and \( W_t \) remains the same.
This is in analogy to how we looked at the realization of the function \( A(\omega)\sin(t) \), and how the evolution in time depends on the outcome for \( A \).
Different realizations of the Wiener process \( W_t \) can thus be thought of generating different sequences of shocks to the governing laws, and a strong solution is thought to depend on the specific sequence of shocks.
However, if we look at the collection of all possible sequences of shocks that can be generated by \( W_t \), this gives a (non-singular) probability distribution for \( X_t \) at all times.
Particularly, each realization of a strong solution gives a particular sample (path) of the probability distribution for \( X_t \).

Modes of convergence

As with ODEs, Lipshitz continuity of the drift and diffusion functions gives the existence and uniqueness of strong solutions to the SDE.
However, not all SDEs admit strong solutions, and more generally we may be concerned just with the probabilistic aspects of such a simulation.
This follows the analogy of almost sure convergence (similar to strong convergence) versus convergence in probability alone.
We may formally define a solution in which we guarantee only that the forward probability distribution matches that generated by the SDE;
- however, we may not actually guarantee a (point-wise) solution that matches a particular sample path given some realization of \( W_t \).
This is loosely what is known as weak convergence, which we will consider more in depth when we study the numerical solutions to these equations.

Fokker-Planck equations

While strong solutions of the SDE equation give sample realizations of the probability distribution for \( X_t \), we may also consider solving for this probability distribution directly.
Suppose that \( X_0 \) has a density defined as \( p(x_0) \) then, given the SDE equation, we can also study how this initial prior evolves in time.
Particularly, the SDE defines a Markov model, and we will denote the transition density as \( p(t,x) \).

Fokker-Planck equations
For a random process \( X_t \) with an SDE governing the time evolution, an initial prior and transition density \( p(t,x) \) as above, the Fokker-Planck equations are defined as \[ \begin{align} \partial_t p + \partial_x (a p) - \frac{1}{2} \partial_x^2 \left(pb^2\right) = 0. \end{align} \] In particular, the above partial differential equation defines the probability density for \( X_t \) at all times \( t \) given the initial prior \( p(x_0) \).

The Fokker-Planck equations completely define the solution to all sample paths \( X_t \), as this provides the entire probability density.
- Realizations of sample paths are thus drawn from this joint density in time.
However, solving the Fokker-Planck equations becomes computationally unfeasible for any dimension \( N_x> 3 \) in practice, so that this full solution is only theoretical.
Rather, we will typically consider an ensemble of sample path solutions to uncertain O/S DEs to generate empirical statistics from this theoretical density in practice.