Metropolis-Hastings Algorithm

The Metropolis-Hasting Algorithm is a Markov chain method for constructing a random sample from a probability distribution that is proportional to a given density function. This is important in Bayesian statistics, since the posterior density (likelyhood times prior) may be difficult to integrate, so you might not know the actual (scaled) posterior probability distribution. Instead, with this algorithm, the (unscaled) posterior density function is enough.

Before you can implement the algorithm, you need a density function. Here I’m using the discrete function: \[f(n) = \begin{cases} \begin{pmatrix} n \\ 15 \end{pmatrix} (0.5)^n & \text{ if } n \ge 15 \\ 0 & \text{ otherwise}. \end{cases}\]

f = function(n) {
  if (n >= 15) {
    return (choose(n,15)/2^n)
  } else {
    return (0)
  }
}

Now, the algorithm will run a Markov Chain Monte Carlo (MCMC) simulation. We need to decide how long we want the similation to run and what state it will start in. Below, N is the number of iterations, currentState is set to an initial state of 15, and results will store all of the results of the simulation.

N = 10^5 
currentState = 15
results = currentState

A key part of the Metropolis-Hastings algorithms is the method for choosing candidate states from the current state. It is important to choose something the lets the algorithm mix the states well. You want the algorithm to quickly explore all possible states. Here, I’ve chosen a transition function that randomly adds -1, 0, or 1 to the current state. This will be our candidate state.

transition = function(state) {
  return(state+sample(-1:1,1))
}

Once we find a candidate state, the next step of Metropolis-Hastings is to either accept or reject that state. If we reject the candidate, then we stay at our current state for another round. The accept function below decides whether or not to accept a new state. The acceptance function here only works if the transition probabilities are symmetric, i.e., the probability of switching from state i to state j is the same as the probability of switching from state j to state i, for all states i and j.

accept = function(newState,oldState) {
  ratio = f(newState)/f(oldState)
  return( (ratio >= 1) || (runif(1,0,1) < ratio) ) 
}

We are finally ready to implement the algorithm. The for-loop below iterates the Markov chain many times.

for (i in 1:N) {
  newState = transition(currentState)
  if (accept(newState,currentState)) {
    currentState=newState
  }
  results = c(results, currentState)
}  

Looking at a histogram of the results, we can see that they give a pretty good approximation of being normally distributed. Using the vector of results makes it easy to approximate Highest Posterior Density (HPD) credible intervals and Bayes factors from our original (unscaled) posterior density function \(f(x)\).

hist(results,col='gray')