In statistics and physics, multicanonical ensemble (also called multicanonical sampling or flat histogram) is a Markov chain Monte Carlo sampling technique that uses the Metropolis–Hastings algorithm to compute integrals where the integrand has a rough landscape with multiple local minima. It samples states according to the inverse of the density of states, which has to be known a priori or be computed using other techniques like the Wang and Landau algorithm. Multicanonical sampling is an important technique for spin systems like the Ising model or spin glasses.
In systems with a large number of degrees of freedom, like spin systems, Monte Carlo integration is required. In this integration, importance sampling and in particular the Metropolis algorithm, is a very important technique. However, the Metropolis algorithm samples states according to
\exp(-\betaE)
\DeltaE
Multicanonical ensemble uses the Metropolis–Hastings algorithm with a sampling distribution given by the inverse of the density of states of the system, contrary to the sampling distribution
\exp(-\betaE)
The biggest problem in performing a multicanonical ensemble is that the density of states has to be known a priori. One important contribution to multicanonical sampling was the Wang and Landau algorithm, which asymptotically converges to a multicanonical ensemble while calculating the density of states during the convergence.
The multicanonical ensemble is not restricted to physical systems. It can be employed on abstract systems which have a cost function F. By using the density of states with respect to F, the method becomes general for computing higher-dimensional integrals or finding local minima.
Consider a system and its phase-space
\Omega
\boldsymbol{r}
\Omega
\Gamma
F(\Omega)=\Gamma=[\Gammamin,\Gammamax]
The computation of an average quantity
\langleQ\rangle
\langleQ\rangle=\int\OmegaQ(\boldsymbol{r})Pr(\boldsymbol{r})d\boldsymbol{r}
where
Pr(\boldsymbol{r})
Pr(\boldsymbol{r})=1/V
When Q does not depend on the particular state but only on the particular F's value of the state
F(\boldsymbol{r})=F\boldsymbol{r}
\langleQ\rangle
\begin{align} \langleQ\rangle&=
\Gammamax | |
\int | |
\Gammamin |
\int\OmegaQ(F\boldsymbol{r})Pr(F\boldsymbol{r})\delta(f-F\boldsymbol{r})d\boldsymbol{r}df\\ &=
\Gammamax | |
\int | |
\Gammamin |
Q(f)\int\Omega\delta(f-F\boldsymbol{r})Pr(F\boldsymbol{r})d\boldsymbol{r}df\\ &=
\Gammamax | |
\int | |
\Gammamin |
Q(f)P(f)df\\ \end{align}
where
P(f)=\int\OmegaPr(r)\delta(f-F(\boldsymbol{r}))d\boldsymbol{r}
is the marginal distribution of F.
When the system has a large number of degrees of freedom, an analytical expression for
\langleQ\rangle
\langleQ\rangle
\boldsymbol{r}i\in\Omega
\overline{Q}N=
N | |
\sum | |
i=1 |
Q(\boldsymbol{r}i)VPr(\boldsymbol{r}i)
for computing
\langleQ\rangle
\overline{Q}N
\langleQ\rangle
\limN → infty\overline{Q}N=\langleQ\rangle.
One typical problem of this convergence is that the variance of Q can be very high, which leads to a high computational effort to achieve reasonable results.
To improve this convergence, the Metropolis–Hastings algorithm was proposed. Generally, Monte Carlo methods' idea is to use importance sampling to improve the convergence of the estimator
\overline{Q}N
\pi(\boldsymbol{r})
\overline{Q}N=
N | |
\sum | |
i=1 |
Q(\boldsymbol{r}i)\pi-1(\boldsymbol{r}i)Pr(\boldsymbol{r}i)
This estimator generalizes the estimator of the mean for samples drawn from an arbitrary distribution. Therefore, when
\pi(\boldsymbol{r})
When the system is a physical system in contact with a heat bath, each state
\boldsymbol{r}
Pr(\boldsymbol{r}i)\propto\exp(-\betaF\boldsymbol{r})
\pi(\boldsymbol{r})
Pr(\boldsymbol{r}i)
\overline{Q}N=
1 | |
N |
N | |
\sum | |
i=1 |
Q(\boldsymbol{r}i)
Historically, this occurred because the original idea was to use Metropolis–Hastings algorithm to compute averages on a system in contact with a heat bath where the weight is given by the Boltzmann factor,
P(\boldsymbol{x})\propto\exp(-\betaE(\boldsymbol{r}))
While it is often the case that the sampling distribution
\pi
Pr
One way to avoid becoming stuck in local minima of the cost function is to make the sampling technique "invisible" to local minima. This is the basis of the multicanonical ensemble.
The multicanonical ensemble is defined by choosing the sampling distribution to be
\pi(\boldsymbol{r})\propto
1 | |
P(F\boldsymbol{r |
)}
where
P(f)
m(f)=\int\Omega\delta(f-F\boldsymbol{r})\pi(\boldsymbol{r})Pr(\boldsymbol{r})d\boldsymbol{r}\propto\int\Omega\delta(f-F\boldsymbol{r})Pr(\boldsymbol{r})
1 | |
P(F\boldsymbol{r |
)}d\boldsymbol{r}=
1 | |
P(f) |
\int\Omega\delta(f-F\boldsymbol{r})Pr(\boldsymbol{r})d\boldsymbol{r}=1
that is, the average number of samples does not depend on f: all costs f are equally sampled regardless of whether they are more or less probable. This motivates the name "flat-histogram". For systems in contact with a heat bath, the sampling is independent of the temperature and one simulation allows to study all temperatures.
Like in any other Monte Carlo method, there are correlations of the samples being drawn from
P(\boldsymbol{r})
Because the histogram is flat on the variable F, a multicanonic ensemble can be seen as a diffusion process (i.e. a random walk) on the one-dimensional line of F values. Detailed balance of the process dictates that there is no drift on the process. This implies that the tunneling time, in local dynamics, should scale as a diffusion process, and thus the tunneling time should scale quadratically with the size of the spectrum, N:
\tautt\proptoN2
However, in some systems (the Ising model being the most paradigmatic), the scaling suffers from critical slowing down: it is
N2+z
z>0
Non-local dynamics were developed to improve the scaling to a quadratic scaling (see the Wolff algorithm), beating the critical slowing down. However, it is still an open question whether there is a local dynamics that does not suffer from critical slowing down in spin systems like the Ising model.