Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information.[1] [2] The approximation is justified by the Bernstein–von Mises theorem, which states that, under regularity conditions, the error of the approximation tends to 0 as the number of data points tends to infinity.[3] [4]
For example, consider a regression or classification model with data set
\{xn,yn\}n=1,\ldots,N
x
y
\theta
D
p({\bfy}|{\bfx},\theta)
p(\theta)
p({\bfy},\theta|{\bfx})
p({\bfy},\theta|{\bfx}) = p({\bfy}|{\bfx},\theta)p(\theta|{\bfx}) = p({\bfy}|{\bfx})p(\theta|{\bfy},{\bfx}) \simeq \tildeq(\theta) = Zq(\theta).
p({\bfy}|{\bfx})
p(\theta|{\bfy},{\bfx})
\theta
In Laplace's approximation, we approximate the joint by an un-normalised Gaussian
\tildeq(\theta)=Zq(\theta)
q
\tildeq
Z
\tildeq
\theta
p({\bfy}|{\bfx})
\theta
p(\theta|{\bfy},{\bfx})
\theta
Z
q(\theta)
Laplace's approximation is
p({\bfy},\theta|{\bfx}) \simeq p({\bfy},\hat\theta|{\bfx})\exp(-\tfrac{1}{2}(\theta-\hat\theta)\topS-1(\theta-\hat\theta)) = \tildeq(\theta),
where we have defined
\begin{align} \hat\theta& = \operatorname{argmax}\thetalogp({\bfy},\theta|{\bfx}),\\ S-1& = -\left.\nabla\theta\nabla\thetalogp({\bfy},\theta|{\bfx})\right|\theta=\hat\theta,\end{align}
where
\hat\theta
S-1
D x D
\theta=\hat\theta
\hat\theta
In summary, we have
\begin{align} q(\theta)& = {\calN}(\theta|\mu=\hat\theta,\Sigma=S),\\ logZ& = logp({\bfy},\hat\theta|{\bfx})+\tfrac{1}{2}log|S|+\tfrac{D}{2}log(2\pi), \end{align}
for the approximate posterior over
\theta
The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay,[5] and for Gaussian processes by Williams and Barber.[6]
. John A. Hartigan . Asymptotic Normality of Posterior Distributions . Bayes Theory . Springer Series in Statistics . New York . Springer . 1983 . 107–118 . 978-1-4613-8244-7. 10.1007/978-1-4613-8242-3_11 .