The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for which direct sampling is difficult. This sequence can be used to estimate integrals with respect to the target distribution (expected values).
Hamiltonian Monte Carlo corresponds to an instance of the Metropolis–Hastings algorithm, with a Hamiltonian dynamics evolution simulated using a time-reversible and volume-preserving numerical integrator (typically the leapfrog integrator) to propose a move to a new point in the state space. Compared to using a Gaussian random walk proposal distribution in the Metropolis–Hastings algorithm, Hamiltonian Monte Carlo reduces the correlation between successive sampled states by proposing moves to distant states which maintain a high probability of acceptance due to the approximate energy conserving properties of the simulated Hamiltonian dynamic when using a symplectic integrator. The reduced correlation means fewer Markov chain samples are needed to approximate integrals with respect to the target probability distribution for a given Monte Carlo error.
The algorithm was originally proposed by Simon Duane, Anthony Kennedy, Brian Pendleton and Duncan Roweth in 1987 for calculations in lattice quantum chromodynamics.[1] In 1996, Radford M. Neal showed how the method could be used for a broader class of statistical problems, in particular artificial neural networks.[2] However, the burden of having to provide gradients of the Bayesian network delayed the wider adoption of the algorithm in statistics and other quantitative disciplines, until in the mid-2010s the developers of Stan implemented HMC in combination with automatic differentiation.[3]
Suppose the target distribution to sample is
f(x)
x\inRd
d\ge1
X0,X1,X2,\ldots
The Hamilton's equations are
dxi | = | |
dt |
\partialH | |
\partialpi |
and \dfrac{dpi}{dt}=-\dfrac{\partialH}{\partialxi}
xi
pi
i
H
M
H(x,p)=U(x)+\dfrac{1}{2}pTM-1p
U(x)
U(x)=-lnf(x)
H
\exp\left(-H\right)
T
kBT
kB
U
M
The algorithm requires a positive integer for number of leapfrog steps
L
\Deltat
Xn=xn
xn(0)=xn
pn(0)
N\left(0,M\right)
L\Deltat
\Deltat
pn\left(t+\dfrac{\Deltat}{2}\right)=pn(t)-\dfrac{\Deltat}{2}\nabla
\left.U(x)\right| | |
x=xn(t) |
xn(t+\Deltat)=xn(t)+\Deltatpn\left(t+\dfrac{\Deltat}{2}\right)
pn(t+\Deltat)= pn\left(t+\dfrac{\Deltat}{2}\right) -\dfrac{\Deltat}{2}\nabla
\left.U(x)\right| | |
x=xn(t+\Deltat) |
xn(0)
pn(0)
L
xn(L\Deltat)
pn(L\Deltat)
The leapfrog algorithm is an approximate solution to the motion of non-interacting classical particles. If exact, the solution will never change the initial randomly-generated energy distribution, as energy is conserved for each particle in the presence of a classical potential energy field. In order to reach a thermodynamic equilibrium distribution, particles must have some sort of interaction with, for example, a surrounding heat bath, so that the entire system can take on different energies with probabilities according to the Boltzmann distribution.
One way to move the system towards a thermodynamic equilibrium distribution is to change the state of the particles using the Metropolis–Hastings algorithm. So first, one applies the leapfrog step, then a Metropolis-Hastings step.
The transition from
Xn=xn
Xn+1
Xn+1|Xn=xn =\begin{cases} xn(L\Deltat)&withprobability\alpha\left(xn(0),xn(L\Deltat)\right)\\ xn(0)&otherwise \end{cases}
\alpha\left(xn(0),xn(L\Deltat)\right)= min\left(1, \dfrac{ \exp\left[ -H(xn(L\Deltat),pn(L\Deltat) ) \right] } { \exp\left[ -H(xn(0),pn(0) ) \right] } \right).
A full update consists of first randomly sampling the momenta
p
Xn+1,Xn+2,Xn+3,\ldots
The No U-Turn Sampler (NUTS)[5] is an extension by controlling
L
L
N(0,1/\sqrt{k})
U(x)=kx2/2
L
L
Loosely, NUTS runs the Hamiltonian dynamics both forwards and backwards in time randomly until a U-Turn condition is satisfied. When that happens, a random point from the path is chosen for the MCMC sample and the process is repeated from that new point.
In detail, a binary tree is constructed to trace the path of the leap frog steps. To produce a MCMC sample, an iterative procedure is conducted. A slice variable
Un\simUniform(0,\exp(-H[xn(0),pn(0)]))
+ | |
x | |
n |
+ | |
p | |
n |
- | |
x | |
n |
- | |
p | |
n |
The iterative procedure continues until the U-Turn condition is met, that is
+ | |
(x | |
n |
-
-) ⋅ | |
x | |
n |
- | |
p | |
n |
<0 or .
+ | |
(x | |
n |
-
-) ⋅ | |
x | |
n |
+ | |
p | |
n |
<0
+ | |
\exp\left[-H(x | |
n |
+ | |
,p | |
n |
)+\delta\right]<Un
- | |
\exp\left[-H(x | |
n |
- | |
,p | |
n |
)+\delta\right]<Un
\delta=1000
Once the U-Turn condition is met, the next MCMC sample,
xn+1
-,\ldots,x | |
\{x | |
n(-\Delta |
t),xn(0),xn(\Delta
+\} | |
t),\ldots,x | |
n |
U | ||
|