Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
In the first part, we discussed the development of each of:
In this section, we will now discuss how these pieces are put together into the Metropolis-Hastings algorithm.
Likewise, we will discuss how this can be used as a data assimilation technique to sample the posterior.
Particularly, the invariant distribution we would like to sample with this scheme is precisely
\[ \begin{align} \pmb{\pi}(\pmb{x}_{L:0}) = p(\pmb{x}_{L:0}|\pmb{y}_{L:1}) \end{align} \] for an arbitrary time series of model and observation states.
Once we have given a form of the general Metropolis-Hastings algorithm, the application to this specific case will be illustrated.
As in the acceptance-rejection method, suppose we have a density that can generate candidates.
Since we are dealing with Markov chains, however, we permit that density to depend on the current state of the process.
Accordingly, the candidate-generating density is denoted \( q(\pmb{x}, \pmb{y}) \), where
\[ \begin{align} \int q(\pmb{x},\pmb{y}) \mathrm{d} \pmb{y} = 1. \end{align} \]
This density is to be interpreted as, when a process is at the point \( \pmb{x} \),
If it happens that \( q(\pmb{x}, \pmb{y}) \) itself satisfies the reversibility condition
\[ \begin{align} \pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y}) = \pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x}) \end{align} \] for all \( \pmb{x}, \pmb{y} \), our search for the Markov chain is over.
However, the reversibility condition is not generically satisfied;
\[ \begin{align} \pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y}) > \pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x}). \end{align} \]
In this case, somewhat loosely, the process moves from \( \pmb{x} \) to \( \pmb{y} \) too often and from \( \pmb{y} \) to \( \pmb{x} \) too infrequently.
One way to correct this is to reduce the number of moves from \( \pmb{x} \) to \( \pmb{y} \) by introducing a probability
$$\begin{align} \alpha(\pmb{x}, \pmb{y}) < 1 \end{align}$$
where we refer to \( \alpha(\pmb{x},\pmb{y}) \) as the probability of move.
If the move is not made, the process again returns \( \pmb{x} \) as a value from the target distribution.
This is in contrast with the acceptance rejection method, where if a \( \pmb{y} \) is rejected, a new pair \( (\pmb{y}, u) \) is drawn independently of the previous value of \( \pmb{y} \).
Thus transitions from \( \pmb{x} \) to \( \pmb{y} \), where \( \pmb{y} \neq \pmb{x} \), are made according to Metropolis-Hastings by
\[ \begin{align} p_{\mathrm{MH}}(\pmb{x},\pmb{y}):= q(\pmb{x},\pmb{y}) \alpha(\pmb{x},\pmb{y}) \end{align} \] when \( \pmb{x}\neq \pmb{y} \) for some \( \alpha \) yet-to-be-determined.
Consider again the inequality,
\[ \begin{align} \pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y}) > \pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x}). \end{align} \]
This tells us that a move from \( \pmb{y} \) to \( \pmb{x} \) is not made often enough and we should therefore define \( \pmb{\alpha}(\pmb{y}, \pmb{x}) \) to be as large as possible.
The probability of move \( \alpha(\pmb{x}, \pmb{y}) \) is determined by requiring that
\[ \begin{align} p_{\mathrm{MH}}(\pmb{x}, \pmb{y}) \pmb{\pi}(\pmb{x}) = p_{\mathrm{MH}}(\pmb{y}, \pmb{x}) \pmb{\pi}(\pmb{y}) \end{align} \] i.e., such that it satisfies the reversibility condition.
Notice, by substitution, we recover
\[ \begin{align} \pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y})\alpha(\pmb{x},\pmb{y}) = \pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x})\alpha(\pmb{y},\pmb{x}), \end{align} \]
Therefore, we define the appropriate probability of move \( \alpha(\pmb{x},\pmb{y}) \) as
\[ \begin{align} \alpha(\pmb{x},\pmb{y}) := \frac{\pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x})}{\pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y})} \end{align} \]
If the above inequality is reversed, we may simply reverse the argument.
The construction of the probabilities of move as before,
\[ \begin{align} \alpha(\pmb{x},\pmb{y}) := \frac{\pmb{\pi}(\pmb{y}) q(\pmb{y},\pmb{x})}{\pmb{\pi}(\pmb{x}) q(\pmb{x},\pmb{y})} & & \alpha(\pmb{y},\pmb{x}) := 1 \end{align} \] are defined to ensure that \( p_{\mathrm{MH}} \) is reversible with respect to the invariant distribution.
This again will ensure that the Metropolis-Hastings chain will converge to the appropriate invariant distribution \( \pmb{\pi}^\ast \) given sufficiently many iterates of the process.
Thus, the criterion to ensure reversability becomes
\[ \begin{align} \alpha(\pmb{x},\pmb{y}) := \begin{cases} \mathrm{min}\left[\frac{\pmb{\pi}(\pmb{y})q(\pmb{y},\pmb{x})}{\pmb{\pi}(\pmb{x})q(\pmb{x},\pmb{y})}, 1 \right] & \text{if}\quad\pmb{\pi}(\pmb{x})q(\pmb{x},\pmb{y}) > 0\\ 1 & \text{else } \end{cases} \end{align} \]
To complete the definition of the transition kernel for the Metropolis-Hastings chain,
As defined previously, we wrote
\[ \begin{align} r(\pmb{x})=1 - \int p(\pmb{x},\pmb{y}) \mathrm{d}\pmb{y} \end{align} \]
Consequently, the transition kernel of the Metropolis-Hastings chain is given as
\[ \begin{align} \mathcal{P}_{\mathrm{MH}}(\mathrm{d}\pmb{y}| \pmb{x}):= q(\pmb{x},\pmb{y})\alpha(\pmb{x},\pmb{y})\mathrm{d}\pmb{y} + \left[1 - \int q(\pmb{x},\pmb{y})\alpha(\pmb{x},\pmb{y})\mathrm{d}\pmb{y} \right] \pmb{\delta}_{\pmb{x}}(\mathrm{d}\pmb{y}) \end{align} \] by applying our construction to the previous ansatz.
By constructing the chain in this way, we guarantee that this converges to the invariant distribution \( \pmb{\pi}^\ast \) after sufficiently many iterates of the chain.
Courtesy of Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49(4), 327-335.
One simple choice for the \( q(\pmb{x},\pmb{y}) \) that is symmetric in \( \pmb{x},\pmb{y} \) is as follows.
Suppose that \( \phi \) is the multivariate Gaussian density with mean zero and some selected covariance.
We take \( q(\pmb{x},\pmb{y}) = \phi(\pmb{y} - \pmb{x}) \);
Particularly, we take \( \pmb{z}\sim \phi \) so that this simply becomes a random walk with Gaussian noise, used to explore the state space.
Therefore, with the symmetric choice above, the probability of move is given by
\[ \begin{align} \alpha(\pmb{x},\pmb{y}) = \mathrm{min}\left[\frac{\pmb{\pi}(\pmb{y})}{\pmb{\pi}(\pmb{x})},1 \right]. \end{align} \]
This fully specifies the Metropolis-Hastings algorithm if we have knowledge of \( \pmb{\pi} \) up to proportionality.
The last step is now to identify how this is related to the Bayesian smoothing posteior…