Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
In our last session, we introduced the notion of the stochastic integral, with two standard forms, the Itô and Stratonovich forms.
Particularly, we discussed some of the ways that the stochastic integral extends, and is different from the standard deterministic integral.
Despite the differences, we are able to formally manipulate these equations with e.g., Itô's lemmas.
In particular, these concepts allow us to derive what is known as a stochastic differential equation as an extension of the ordinary differential equation.
Giving an intuition on this extension, and how we will use this formalism to sample a target density, will be the focus of this lecture.
A general ordinary differential equation (ODE) is written as
\[ \begin{align} & \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} := \pmb{f}(t, \pmb{x}) \\ \Leftrightarrow & \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]
When \( \pmb{f} \) satisfies a regularity condition, this equation will have a unique solution given some initial data.
Lipshitz Continuity
The function \( \pmb{f}:\mathbb{R}^{N_x} \rightarrow \mathbb{R}^{N_x} \) is said to be Lipshitz continuous at a point \( \pmb{x}_0 \) if for all \( \pmb{x}_1 \), in a sufficiently small neighborhood of \( \pmb{x}_0 \), \[ \begin{align} \parallel \pmb{f}(\pmb{x}_0) - \pmb{f}(\pmb{x}_1) \parallel \leq K \parallel \pmb{x}_0 - \pmb{x}_1\parallel \end{align} \] for a fixed constant \( K\in \mathbb{R} \).
Lipshitz continuity above is stronger than regular continuity, but weaker than differentiability.
For this reason, we consider Lipshitz functions to be differentiable “almost everywhere”, where the number of non-differentiable spikes is limited.
Recall the differential equation on the last slide
\[ \begin{align} \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]
Provided that \( \pmb{f} \) is Lipshitz in its components at some initial condition, in the state variable and time, it can be shown that there is a unique solution defined for this initial data.
Picard-Lindelölf theorem
Suppose that \( \pmb{f} \) satisfies the Lipshitz condition at a point \( (0,\pmb{x}_0) \) as previously discussed. Then there is a unique solution \( \pmb{x}(t) \) defined on some time interval \( [ -\epsilon, \epsilon] \) for which:
- \( \pmb{x}(0)= \pmb{x}_0 \),
- \( \frac{\mathrm{d}}{\mathrm{d}t}|_{t=t_0}\pmb{x} = \pmb{f}(t_0, \pmb{x}(t_0)) \), and
- where we formally write \[ \begin{align} \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]
Notice that this only defines a solution within a local neighborhood, depending on a range of time around the initial condition.
This known as an initial value problem, as previously discussed in the context of Markov models.
Particularly, suppose that we have an initial prior on the state vector \( \pmb{x}_0 \) and \( \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x}=\pmb{f} \) is known to satisfy the Lipshitz condition in the support of \( p(\pmb{x}_0) \).
This actually defines a deterministic Markov model, but where there is uncertainty in the initial value.
We can define the discrete mapping under the continuous time model by the flow map discussed before, where
\[ \begin{align} \boldsymbol{\Phi}(t, \pmb{x}_0) = \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]
In the case that \( \pmb{f} \) is a linear transformation, \( \boldsymbol{\Phi}\equiv \mathbf{M}_t \) for some matrix, as with the previously defined Gauss-Markov model.
In this case, we once again generate a transition kernel as
\[ \begin{align} \mathcal{P}\left(\pmb{x}_t | \pmb{x}_0\right) = \delta_{\mathbf{M}_t \pmb{x}_0} \end{align} \] with \( \delta_{\pmb{v}} \) referring to the Dirac measure at \( \pmb{v} \in \mathbb{R}^{N_x} \).
The Dirac probability measure is defined by the property,
\[ \begin{align} \int f(x) \boldsymbol{\delta}_{\pmb{v}}\left(\mathrm{d}\pmb{x}\right) = f\left(\pmb{v}\right); \end{align} \] particularly, the Dirac delta is a singular measure, understood by the integral equation.
A general, scalar stochastic differential equation (SDE) is written as
\[ \begin{align} \mathrm{d}X_t := a(t, X_t)\mathrm{d}t + b(t, X_t)\mathrm{d}W_t \end{align} \] where \( a \) is known as the drift function and \( b \) is known as the diffusion function.
The above SDE is written in the Itô form, while there exists an equivalent Stratonovich form given as,
\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t, X_t) \partial_x b(t, X_t)\right] \mathrm{d}t + b(t, X_t) \circ \mathrm{d}W_t. \end{align} \]
The Itô SDE has a formal solution given by
\[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t \end{align} \]
An immediate difference from the ODE initial value problem is that the evolution of the state is given by a random variable, with a distribution that depends in time on the realization of the Wiener process.
In this case, the drift terms represent the mechanistic laws governing the process, while the diffusion terms represent the random shocks to the system.
If the drift and diffusion \( a,b \) are linear functions, this furthermore defines a Gauss-Markov model.
We will note here a particular scenario that is of special relevance to our discussions.
When the diffusion term \( b \) has no dependence on the model state \( X_t \), then the model is said to be one of additive noise.
Recall then the Stratonovich SDE
\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t) \partial_x b(t)\right] \mathrm{d}t + b(t ) \circ \mathrm{d}W_t. \end{align} \]
In particular, \( \partial_x b \equiv 0 \) when \( b \) is only a function of time, i.e.,
\[ \begin{align} \mathrm{d}X_t := a(t, X_t) \mathrm{d}t + b(t) \circ \mathrm{d}W_t. \end{align} \]
This is a scenario that is frequently studied in data assimilation literature, because of the simplification of the SDE above, and for the way this represents precisely unbiased shocks to governing process laws.
We will return to such systems when we look at numerical solutions shortly.
Strong convergence
A strong solution \( X_t \) of an Itô SDE (or equivalent Stratonovich SDE) has the following properties:
- \( X_T \) satisfies \[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t, \end{align} \] and for all times \( T \), \( X_T \) is a function of \( a,b \) and the realization of \( W_t \) for all times \( t<T \); and
- the integrals in the above are well-defined in terms of the proper modes of convergence.
The important notion here is that if we change the realization of the Wiener process \( W_t \), then also the strong solution \( X_t \) changes, but the functional relation between \( X_t \) and \( W_t \) remains the same.
This is in analogy to how we looked at the realization of the function \( A(\omega)\sin(t) \), and how the evolution in time depends on the outcome for \( A \).
Different realizations of the Wiener process \( W_t \) can thus be thought of generating different sequences of shocks to the governing laws, and a strong solution is thought to depend on the specific sequence of shocks.
However, if we look at the collection of all possible sequences of shocks that can be generated by \( W_t \), this gives a (non-singular) probability distribution for \( X_t \) at all times.
Particularly, each realization of a strong solution gives a particular sample (path) of the probability distribution for \( X_t \).
As with ODEs, Lipshitz continuity of the drift and diffusion functions gives the existence and uniqueness of strong solutions to the SDE.
However, not all SDEs admit strong solutions, and more generally we may be concerned just with the probabilistic aspects of such a simulation.
This follows the analogy of almost sure convergence (similar to strong convergence) versus convergence in probability alone.
We may formally define a solution in which we guarantee only that the forward probability distribution matches that generated by the SDE;
This is loosely what is known as weak convergence, which we will consider more in depth when we study the numerical solutions to these equations.
While strong solutions of the SDE equation give sample realizations of the probability distribution for \( X_t \), we may also consider solving for this probability distribution directly.
Suppose that \( X_0 \) has a density defined as \( p(x_0) \) then, given the SDE equation, we can also study how this initial prior evolves in time.
Particularly, the SDE defines a Markov model, and we will denote the transition density as \( p(t,x) \).
Fokker-Planck equations
For a random process \( X_t \) with an SDE governing the time evolution, an initial prior and transition density \( p(t,x) \) as above, the Fokker-Planck equations are defined as \[ \begin{align} \partial_t p + \partial_x (a p) - \frac{1}{2} \partial_x^2 \left(pb^2\right) = 0. \end{align} \] In particular, the above partial differential equation defines the probability density for \( X_t \) at all times \( t \) given the initial prior \( p(x_0) \).
The Fokker-Planck equations completely define the solution to all sample paths \( X_t \), as this provides the entire probability density.
However, solving the Fokker-Planck equations becomes computationally unfeasible for any dimension \( N_x> 3 \) in practice, so that this full solution is only theoretical.
Rather, we will typically consider an ensemble of sample path solutions to uncertain O/S DEs to generate empirical statistics from this theoretical density in practice.