Integrated nested Laplace approximations explained

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method.[1] It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.[2] [3] [4] Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.[5] [6] [7] It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.[8] [9] The INLA method is implemented in the R-INLA R package.[10]

Latent Gaussian models

Let

\boldsymbol{y}=(y1,...,yn)

denote the response variable (that is, the observations) which belongs to an exponential family, with the mean

\mui

(of

yi

) being linked to a linear predictor

ηi

via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector

\boldsymbol{x}

. The hyperparameters of the model are denoted by

\boldsymbol{\theta}

. As per Bayesian statistics,

\boldsymbol{x}

and

\boldsymbol{\theta}

are random variables with prior distributions.

The observations are assumed to be conditionally independent given

\boldsymbol{x}

and

\boldsymbol{\theta}

: \pi(\boldsymbol |\boldsymbol, \boldsymbol) = \prod_\pi(y_i | \eta_i, \boldsymbol), where

l{I}

is the set of indices for observed elements of

\boldsymbol{y}

(some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor

\boldsymbol{η}

is part of

\boldsymbol{x}

.

For the model to be a latent Gaussian model, it is assumed that

\boldsymbol{x}|\boldsymbol{\theta}

is a Gaussian Markov Random Field (GMRF) (that is, a multivariate Gaussian with additional conditional independence properties) with probability density \pi(\boldsymbol | \boldsymbol) \propto \left| \boldsymbol \right|^ \exp \left(-\frac \boldsymbol^T \boldsymbol \boldsymbol \right),where

\boldsymbol{Q\theta

} is a

\boldsymbol{\theta}

-dependent sparse precision matrix and

\left|\boldsymbol{Q\theta

} \right| is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution

\pi(\boldsymbol{\theta})

for the hyperparameters need not be Gaussian. However, the number of hyperparameters,

m=dim(\boldsymbol{\theta})

, is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables

\boldsymbol{x}

and

\boldsymbol{\theta}

. Applying Bayes' theorem\pi(\boldsymbol, \boldsymbol | \boldsymbol) = \frac,the joint posterior distribution of

\boldsymbol{x}

and

\boldsymbol{\theta}

is given by\begin\pi(\boldsymbol, \boldsymbol | \boldsymbol) & \propto \pi(\boldsymbol)\pi(\boldsymbol|\boldsymbol) \prod_i \pi(y_i | \eta_i, \boldsymbol) \\& \propto \pi(\boldsymbol) \left| \boldsymbol \right|^ \exp \left(-\frac \boldsymbol^T \boldsymbol \boldsymbol + \sum_i \log \left[\pi(y_i | \eta_i, \boldsymbol{\theta}) \right] \right). \endObtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals\begin\pi(x_i | \boldsymbol) &=& \int \pi(x_i | \boldsymbol, \boldsymbol) \pi(\boldsymbol | \boldsymbol) d\boldsymbol \\\pi(\theta_j | \boldsymbol) &=& \int \pi(\boldsymbol | \boldsymbol) d \boldsymbol_,\endwhere

\boldsymbol{\theta}-j=\left(\theta1,...,\thetaj-1,\thetaj+1,...,\thetam\right)

.

A key idea of INLA is to construct nested approximations given by\begin\widetilde(x_i | \boldsymbol) &=& \int \widetilde(x_i | \boldsymbol, \boldsymbol) \widetilde(\boldsymbol | \boldsymbol) d\boldsymbol \\\widetilde(\theta_j | \boldsymbol) &=& \int \widetilde(\boldsymbol | \boldsymbol) d \boldsymbol_,\endwhere

\widetilde{\pi}(|)

is an approximated posterior density. The approximation to the marginal density

\pi(xi|\boldsymbol{y})

is obtained in a nested fashion by first approximating

\pi(\boldsymbol{\theta}|\boldsymbol{y})

and

\pi(xi|\boldsymbol{\theta},\boldsymbol{y})

, and then numerically integrating out

\boldsymbol{\theta}

as\begin\widetilde(x_i | \boldsymbol) = \sum_k \widetilde\left(x_i | \boldsymbol_k, \boldsymbol \right) \times \widetilde(\boldsymbol_k | \boldsymbol) \times \Delta_k,\endwhere the summation is over the values of

\boldsymbol{\theta}

, with integration weights given by

\Deltak

. The approximation of

\pi(\thetaj|\boldsymbol{y})

is computed by numerically integrating

\boldsymbol{\theta}-j

out from

\widetilde{\pi}(\boldsymbol{\theta}|\boldsymbol{y})

.

To get the approximate distribution

\widetilde{\pi}(\boldsymbol{\theta}|\boldsymbol{y})

, one can use the relation\begin(\boldsymbol | \boldsymbol) = \frac,\endas the starting point. Then

\widetilde{\pi}(\boldsymbol{\theta}|\boldsymbol{y})

is obtained at a specific value of the hyperparameters

\boldsymbol{\theta}=\boldsymbol{\theta}k

with Laplace's approximation\begin\widetilde(\boldsymbol_k | \boldsymbol) &\propto \left . \frac \right \vert_, \\& \propto \left . \frac \right \vert_,\endwhere

\widetilde{\pi}G\left(\boldsymbol{x}|\boldsymbol{\theta}k,\boldsymbol{y}\right)

is the Gaussian approximation to

{\pi}\left(\boldsymbol{x}|\boldsymbol{\theta}k,\boldsymbol{y}\right)

whose mode at a given

\boldsymbol{\theta}k

is

\boldsymbol{x}*(\boldsymbol{\theta}k)

. The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of

\boldsymbol{x}

in the denominator since it is usually close to a Gaussian due to the GMRF property of

\boldsymbol{x}

. Applying the approximation here improves the accuracy of the method, since the posterior

{\pi}(\boldsymbol{\theta}|\boldsymbol{y})

itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on

{\pi}(\boldsymbol{\theta}|\boldsymbol{y})

. The second important property of a GMRF, the sparsity of the precision matrix

\boldsymbol{Q}\boldsymbol{\thetak}

, is required for efficient computation of

\widetilde{\pi}(\boldsymbol{\theta}k|\boldsymbol{y})

for each value

{\boldsymbol{\theta}k}

.

Obtaining the approximate distribution

\widetilde{\pi}\left(xi|\boldsymbol{\theta}k,\boldsymbol{y}\right)

is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation.[1] For the numerical integration to obtain

\widetilde{\pi}(xi|\boldsymbol{y})

, also three options are available: grid search, central composite design, or empirical Bayes.[1]

Further reading

Notes and References

  1. Rue . Håvard . Martino . Sara . Chopin . Nicolas . Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations . J. R. Statist. Soc. B . 2009 . 71 . 2 . 319–392 . 10.1111/j.1467-9868.2008.00700.x. 1657669 .
  2. Taylor . Benjamin M. . Diggle . Peter J. . INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes . Journal of Statistical Computation and Simulation . 2014 . 84 . 10 . 2266–2284 . 10.1080/00949655.2013.788653. 1202.1738 . 88511801 .
  3. Teng . M. . Nathoo . F. . Johnson . T. D. . Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods . Journal of Statistical Computation and Simulation . 2017 . 87 . 11 . 2227–2252 . 10.1080/00949655.2017.1326117. 29200537 . 5708893 .
  4. Book: Wang . Xiaofeng . Yue . Yu Ryan . Faraway . Julian J. . Bayesian Regression Modeling with INLA . 2018 . Chapman and Hall/CRC . 9781498727259.
  5. Book: Blangiardo . Marta . Cameletti . Michela . Spatial and Spatio-temporal Bayesian Models with R-INLA . 2015 . John Wiley & Sons, Ltd . 9781118326558.
  6. Opitz . T. . Latent Gaussian modeling and INLA: A review with focus on space-time applications . Journal de la Société Française de Statistique . 2017 . 158 . 62–85. 1708.02723 .
  7. Book: Moraga . Paula . Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny . 2019 . Chapman and Hall/CRC . 9780367357955.
  8. Lindgren . Finn . Rue . Håvard . Lindström . Johan . An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach . J. R. Statist. Soc. B . 2011 . 73 . 4 . 423–498 . 10.1111/j.1467-9868.2011.00777.x. 120949984 . 20.500.11820/1084d335-e5b4-4867-9245-ec9c4f6f4645 . free .
  9. Lezama-Ochoa . N. . Grazia Pennino . M. . Hall . M. A. . Lopez . J. . Murua . H. . Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular) . Scientific Reports . 2020 . 10 . 1 . 18822 . 10.1038/s41598-020-73879-3. 33139744 . 7606447 . 2020NatSR..1018822L .
  10. Web site: R-INLA Project . 21 April 2022.