Leimkuhler–Matthews method explained

In mathematics, the Leimkuhler-Matthews method (or LM method in its original paper [1]) is an algorithm for finding discretized solutions to the Brownian dynamics

dX=-\nablaV(X)dt+\sigmadW,

where

\sigma>0

is a constant,

V(X)

is an energy function and

W(t)

is a Wiener process. This stochastic differential equation has solutions (denoted

X(t)\inRN

at time

t

) distributed according to

\pi(X)\propto\exp(-V(x))

in the limit of large-time, making solving these dynamics relevant in sampling-focused applications such as classical molecular dynamics and machine learning.

Given a time step

\Deltat>0

, the Leimkuhler-Matthews update scheme is compactly written as

Xt+\Delta=Xt-\nablaV(Xt)\Deltat+\sigma

\sqrt{\Deltat
}2 \, (R_t+R_),

with initial condition

X0:=X(0)

, and where

XtX(t)

. The vector

Rt

is a vector of independent normal random numbers redrawn at each step so

E[RtRs]=N\deltats

(where

E[\bullet]

denotes expectation). Despite being of equal cost to the Euler-Maruyama scheme (in terms of the number of evaluations of the function

\nablaV(X)

per update), given some assumptions on

\Deltat,V(X)

and

f(X)

solutions have been shown [2] to have a superconvergence property

E[|f(Xt)-f(X(t))|]\leqC1e\Deltat+C2\Deltat2

for constants

Ck\geq0,λ>0

not depending on

t

. This means that as

t

gets large we obtain an effective second order with

\Deltat2

error in computed expectations. For small time step

\Deltat

this can give significant improvements over the Euler-Maruyama scheme, at no extra cost.

Discussion

Comparison to other schemes

The obvious method for comparison is the Euler-Maruyama scheme as it has the same cost, requiring one evaluation of

\nablaV(X)

per step. Its update is of the form

\hat{X}t+\Delta=\hat{X}t-\nablaV(\hat{X}t)\Deltat+\sigma{\sqrt{\Deltat}}Rt,

with error (given some assumptions [3]) as

E[|f(\hat{X}t)-f(X(t))|]\leqC\Deltat

with constant

C>0

independent of

t

. Compared to the above definition, the only difference between the schemes is the one-step averaged noise term, making it simple to implement.

For sufficiently small time step

\Deltat

and large enough time

t

it is clear that the LM scheme gives a smaller error than Euler-Maruyama. While there are many algorithms that can give reduced error compared to the Euler scheme (see e.g. Milstein, Runge-Kutta or Heun's method) these almost always come at an efficiency cost, requiring more computation in exchange for reducing the error. However the Leimkuhler-Matthews scheme can give significantly reduced error with minimal change to the standard Euler scheme. The trade-off comes from the (relatively) limited scope of the stochastic differential equation it solves:

\sigma

must be a scalar constant and the drift function must be of the form

\nablaV(X)

. The LM scheme also is not Markovian, as updates require more than just the state at time

t

. However, we can recast the scheme as a Markov process by extending the space.

Markovian Form

We can rewrite the algorithm in a Markovian form by extending the state space with a momentum vector

N
P
t\inR
so that the overall state is

(Xt,Pt)

at time

t

. Initializing the momentum to be a vector of

N

standard normal random numbers, we have

X't+\Delta=Xt-\nablaV(Xt)\Deltat+\sigma

\sqrt{\Deltat
}2 \, P_t,

Pt+\Delta\simNormal(0,I),

Xt+\Delta=X't+\Delta+\sigma

\sqrt{\Deltat
}2 \, P_,

where the middle step completely redraws the momentum so that each component is an independent normal random number. This scheme is Markovian, and has the same properties as the original LM scheme.

Applications

The algorithm has application in any area where the weak (i.e. average) properties of solutions to Brownian dynamics are required. This applies to any molecular simulation problem (such as classical molecular dynamics), but also can apply to statistical sampling problems due to the properties of solutions at large times. In the limit of

t\toinfty

, solutions will become distributed according to the Probability distribution

\pi(X)\propto\exp(-V(X))

. Thus we can generate independent samples according to a required distribution by using

V(X)=-log(\pi(X))

and running the LM algorithm until large

t

. Such strategies can be efficient in (for instance) Bayesian inference problems.

See also

Notes and References

  1. Leimkuhler . Benedict . Matthews . Charles . Rational Construction of Stochastic Numerical Methods for Molecular Sampling . Applied Mathematics Research EXpress . 1 January 2013 . 2013 . 1 . 34–56 . 10.1093/amrx/abs010 . en . 1687-1200. 1203.5428 .
  2. Leimkuhler . B. . Matthews . C. . Tretyakov . M. V. . On the long-time integration of stochastic gradient systems . Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences . 8 October 2014 . 470 . 2170 . 20140120 . 10.1098/rspa.2014.0120 . 1402.2797 . 2014RSPSA.47040120L . 15596798 .
  3. Book: Kloeden, P.E. . Platen, E. . amp . Numerical Solution of Stochastic Differential Equations . Springer, Berlin . 1992 . 3-540-54062-8 .