Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Ordinary differential equations
- Stochastic differential equations
- Additive noise
- Modes of convergence
- The Fokker-Planck Equations

In our last session, we introduced the notion of the

**stochastic integral**, with two standard forms, the**Itô**and**Stratonovich**forms.Particularly, we discussed some of the ways that the

**stochastic integral extends, and is different from**the**standard deterministic integral**.Despite the differences, we are able to formally manipulate these equations with e.g., Itô's lemmas.

In particular, these concepts allow us to derive what is known as a

**stochastic differential equation**as an**extension of the ordinary differential equation**.Giving an intuition on this extension, and how we will use this formalism to sample a target density, will be the focus of this lecture.

A general

**ordinary differential equation (ODE)**is written as\[ \begin{align} & \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x} := \pmb{f}(t, \pmb{x}) \\ \Leftrightarrow & \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]

When \( \pmb{f} \)

**satisfies a regularity condition**, this equation will have a**unique solution**given some initial data.

Lipshitz Continuity

The function \( \pmb{f}:\mathbb{R}^{N_x} \rightarrow \mathbb{R}^{N_x} \) is said to beLipshitz continuousat a point \( \pmb{x}_0 \) if for all \( \pmb{x}_1 \), in a sufficiently small neighborhood of \( \pmb{x}_0 \), \[ \begin{align} \parallel \pmb{f}(\pmb{x}_0) - \pmb{f}(\pmb{x}_1) \parallel \leq K \parallel \pmb{x}_0 - \pmb{x}_1\parallel \end{align} \] for a fixed constant \( K\in \mathbb{R} \).

Lipshitz continuity above is

**stronger than regular continuity**, but**weaker than differentiability**.- In particular, if \( \pmb{f}\in \mathcal{C}^1(\mathbb{R}^{N_x}) \), \( \pmb{f} \) satisfies Lipshitz continuity.
- More generally, a function that is Lipshitz continuous can be shown to be differentiable except on a set of measure zero,
- i.e., with
**probability one**, you will select a point in a bounded interval at which**\( \pmb{f} \) is differentiable**.

For this reason, we consider

**Lipshitz functions**to be**differentiable “almost everywhere”**, where the number of non-differentiable spikes is limited.

Recall the differential equation on the last slide

\[ \begin{align} \mathrm{d}\pmb{x} := \pmb{f}(t,\pmb{x})\mathrm{d}t. \end{align} \]

Provided that \( \pmb{f} \) is Lipshitz in its components at some initial condition, in the state variable and time, it can be shown that there is a unique solution defined for this initial data.

Picard-Lindelölf theorem

Suppose that \( \pmb{f} \) satisfies the Lipshitz condition at a point \( (0,\pmb{x}_0) \) as previously discussed. Then there is aunique solution \( \pmb{x}(t) \)defined on some time interval \( [ -\epsilon, \epsilon] \) for which:

- \( \pmb{x}(0)= \pmb{x}_0 \),
- \( \frac{\mathrm{d}}{\mathrm{d}t}|_{t=t_0}\pmb{x} = \pmb{f}(t_0, \pmb{x}(t_0)) \), and
- where we formally write \[ \begin{align} \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]

Notice that this only

**defines a solution within a local neighborhood**, depending on a range of time around the initial condition.This known as an

**initial value problem**, as previously discussed in the context of Markov models.

Particularly, suppose that we have an

**initial prior on the state vector**\( \pmb{x}_0 \) and \( \frac{\mathrm{d}}{\mathrm{d}t} \pmb{x}=\pmb{f} \) is known to**satisfy the Lipshitz condition**in the support of \( p(\pmb{x}_0) \).This actually

**defines a deterministic Markov model**, but where there is**uncertainty in the initial value**.We can define the

**discrete mapping**under the continuous time model by the**flow map**discussed before, where\[ \begin{align} \boldsymbol{\Phi}(t, \pmb{x}_0) = \pmb{x}(t) = \int_0^t \pmb{f}(s, \pmb{x})\mathrm{d}s + \pmb{x}_0. \end{align} \]

In the case that \( \pmb{f} \) is a

**linear transformation**, \( \boldsymbol{\Phi}\equiv \mathbf{M}_t \) for some matrix, as with the previously defined Gauss-Markov model.In this case, we once again generate a transition kernel as

\[ \begin{align} \mathcal{P}\left(\pmb{x}_t | \pmb{x}_0\right) = \delta_{\mathbf{M}_t \pmb{x}_0} \end{align} \] with \( \delta_{\pmb{v}} \) referring to the

**Dirac measure**at \( \pmb{v} \in \mathbb{R}^{N_x} \).The

**Dirac probability measure**is**defined by the property**,\[ \begin{align} \int f(x) \boldsymbol{\delta}_{\pmb{v}}\left(\mathrm{d}\pmb{x}\right) = f\left(\pmb{v}\right); \end{align} \] particularly, the

**Dirac delta is a singular measure**, understood by the integral equation.

- Similarly, we will say that the
**transition “density”**is given in terms of \[ \begin{align} p(\pmb{x}_t \vert \pmb{x}_{0} ) \equiv \delta \left\{\pmb{x}_t - \mathbf{M}_t\left(\pmb{x}_{0}\right)\right\} \end{align} \] where \( \delta \) represents the**Dirac distribution**. - Heuristically, this is known as the “function” which has the property \[ \pmb{\delta}(\pmb{x}) = \begin{cases} +\infty & \pmb{x} = \pmb{0} \\ 0 & \text{else}\end{cases}; \]
- This is just a convenient abuse of notations, as the
**Dirac measure does not have a density with respect to the standard Lebesgue measure**. - Rather, the
**Dirac distribution**is understood through the generalized function theory of distributions as a type of**kernel that gives the property**, \[ \begin{align} \int f(\pmb{x}_{t}) \delta\left\{\pmb{x}_t- \mathbf{M}_t\left(\pmb{x}_{0}\right)\right\}\mathrm{d}\pmb{x}_{t} = f\left(\mathbf{M}_t\left(\pmb{x}_{0}\right)\right). \end{align} \] - This equation is to be interpreted that,
**given a realization of the initial condition**\( \pmb{x}_0 \sim P \), this**defines the probability one of the subsequent realizations \( \mathbf{M}_t \pmb{x}_0 \)**at all times \( t \). - Therefore, such a model is known as a
**“perfect” model**, as it**totally determines the subsequent evolution**of the random process in time. - However, our
**classic Gauss-Markov model**was defined in terms of a perfect model that is perturbed by random shocks, \[ \begin{align} \pmb{x}_{k}:= \mathbf{M}_k \pmb{x}_{k-1} + \pmb{w}_k. \end{align} \] - The extension of such a Gauss-Markov model to a system generated with continuous time is derived with the notion of a stochastic differential equation.

A general, scalar

**stochastic differential equation (SDE)**is written as\[ \begin{align} \mathrm{d}X_t := a(t, X_t)\mathrm{d}t + b(t, X_t)\mathrm{d}W_t \end{align} \] where

**\( a \)**is known as the**drift function**and**\( b \)**is known as the**diffusion function**.The above SDE is written in the Itô form, while there exists an equivalent Stratonovich form given as,

\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t, X_t) \partial_x b(t, X_t)\right] \mathrm{d}t + b(t, X_t) \circ \mathrm{d}W_t. \end{align} \]

The Itô SDE has a formal solution given by

\[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t \end{align} \]

An immediate difference from the ODE initial value problem is that the

**evolution of the state is given by a random variable**, with a distribution that**depends in time on the realization of the Wiener process**.- Particularly, the transition probability is no longer given by a Dirac measure, and instead include uncertainty in the evolution.

In this case, the

**drift terms**represent the**mechanistic laws**governing the process, while the**diffusion terms**represent the**random shocks to the system**.If the

**drift and diffusion \( a,b \) are linear functions**, this furthermore defines a**Gauss-Markov model**.

We will note here a particular scenario that is of special relevance to our discussions.

When the

**diffusion term \( b \)**has**no dependence on the model state \( X_t \)**, then the model is said to be one of**additive noise**.Recall then the Stratonovich SDE

\[ \begin{align} \mathrm{d}X_t := \left[a(t, X_t) -\frac{1}{2} b(t) \partial_x b(t)\right] \mathrm{d}t + b(t ) \circ \mathrm{d}W_t. \end{align} \]

In particular, \( \partial_x b \equiv 0 \) when \( b \) is only a function of time, i.e.,

\[ \begin{align} \mathrm{d}X_t := a(t, X_t) \mathrm{d}t + b(t) \circ \mathrm{d}W_t. \end{align} \]

- Therefore, it can be shown that the
**Stratonovich SDE and the Itô SDE**are the**same for additive noise**.

- Therefore, it can be shown that the
This is a scenario that is frequently studied in data assimilation literature, because of the simplification of the SDE above, and for the way this represents precisely unbiased shocks to governing process laws.

We will return to such systems when we look at numerical solutions shortly.

- As noted with stochastic processes, and stochastic calculus, there are multiple ways we might consider the existence and uniqueness of a solution to an SDE.

Strong convergence

Astrong solution\( X_t \) of an Itô SDE (or equivalent Stratonovich SDE) has the following properties:

- \( X_T \) satisfies \[ \begin{align} X(T) - X(0) = \int_0^T a(s, X_t)\mathrm{d}t + \int_0^T b(t, X_t)\mathrm{d}W_t, \end{align} \] and for all times \( T \), \( X_T \) is a function of \( a,b \) and the realization of \( W_t \) for all times \( t<T \); and
- the integrals in the above are well-defined in terms of the proper modes of convergence.

The important notion here is that

**if we change the realization of the Wiener process \( W_t \)**, then also the**strong solution \( X_t \) changes**, but the functional relation between \( X_t \) and \( W_t \) remains the same.This is in analogy to how we looked at the realization of the function \( A(\omega)\sin(t) \), and how the evolution in time depends on the outcome for \( A \).

Different realizations of the Wiener process \( W_t \) can thus be thought of generating different sequences of shocks to the governing laws, and a strong solution is thought to depend on the specific sequence of shocks.

However, if we look at the

**collection of all possible sequences**of shocks that can be generated by \( W_t \), this gives a (non-singular)**probability distribution for \( X_t \) at all times**.Particularly,

**each realization of a strong solution**gives a**particular sample (path) of the probability distribution**for \( X_t \).

As with ODEs,

**Lipshitz continuity**of the drift and diffusion functions**gives the existence and uniqueness of strong solutions**to the SDE.However,

**not all SDEs admit strong solutions**, and more generally we may be concerned just with the probabilistic aspects of such a simulation.This follows the analogy of almost sure convergence (similar to strong convergence) versus convergence in probability alone.

We may formally define a solution in which

**we guarantee only that the forward probability distribution matches that generated by the SDE**;- however, we
**may not actually guarantee a (point-wise) solution**that matches a particular sample path given some realization of \( W_t \).

- however, we
This is loosely what is known as weak convergence, which we will consider more in depth when we study the numerical solutions to these equations.

While strong solutions of the SDE equation give sample realizations of the probability distribution for \( X_t \), we may also

**consider solving for this probability distribution directly**.Suppose that \( X_0 \) has a density defined as \( p(x_0) \) then, given the SDE equation, we can also study how this initial prior evolves in time.

Particularly, the SDE defines a Markov model, and we will denote the transition density as \( p(t,x) \).

Fokker-Planck equations

For a random process \( X_t \) with an SDE governing the time evolution, an initial prior and transition density \( p(t,x) \) as above, theFokker-Planck equationsare defined as \[ \begin{align} \partial_t p + \partial_x (a p) - \frac{1}{2} \partial_x^2 \left(pb^2\right) = 0. \end{align} \] In particular, the above partial differential equation defines the probability density for \( X_t \) at all times \( t \) given the initial prior \( p(x_0) \).

The Fokker-Planck equations completely define the solution to all sample paths \( X_t \), as this provides the entire probability density.

**Realizations of sample paths**are thus**drawn from this joint density in time**.

However,

**solving the Fokker-Planck equations becomes computationally unfeasible**for any dimension \( N_x> 3 \) in practice, so that**this full solution is only theoretical**.Rather, we will

**typically consider an ensemble of sample path solutions**to**uncertain O/S DEs to generate empirical statistics**from this theoretical density in practice.