Normal-inverse-Wishart distribution explained

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

Definition

Suppose

\boldsymbol\mu|\boldsymbol\mu0,λ,\boldsymbol\Sigma\sim

l{N}\left(\boldsymbol\mu|\boldsymbol\mu
0,1
λ

\boldsymbol\Sigma\right)

has a multivariate normal distribution with mean

\boldsymbol\mu0

and covariance matrix

\tfrac{1}{λ}\boldsymbol\Sigma

, where

\boldsymbol\Sigma|\boldsymbol\Psi,\nu\siml{W}-1(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

has an inverse Wishart distribution. Then

(\boldsymbol\mu,\boldsymbol\Sigma)

has a normal-inverse-Wishart distribution, denoted as

(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu).

Characterization

Probability density function

f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu)=

l{N}\left(\boldsymbol\mu|\boldsymbol\mu
0,1
λ

\boldsymbol\Sigma\right)l{W}-1(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

The full version of the PDF is as follows:[2]

f(\boldsymbol{\mu},\boldsymbol{\Sigma}|\boldsymbol{\mu}0,λ,\boldsymbol{\Psi},\nu) =

λD/2|\boldsymbol{\Psi
|

\nu

-\nu+D+2
2
|\boldsymbol{\Sigma}|
}\text\left\

Here

\GammaD[]

is the multivariate gamma function and

Tr(\boldsymbol{\Psi})

is the Trace of the given matrix.

Properties

Marginal distributions

By construction, the marginal distribution over

\boldsymbol\Sigma

is an inverse Wishart distribution, and the conditional distribution over

\boldsymbol\mu

given

\boldsymbol\Sigma

is a multivariate normal distribution. The marginal distribution over

\boldsymbol\mu

is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

\boldsymbol{yi}|\boldsymbol\mu,\boldsymbol\Sigma\siml{N}p(\boldsymbol\mu,\boldsymbol\Sigma)

where

\boldsymbol{y}

is an

n x p

matrix and

\boldsymbol{yi}

(of length

p

) is row

i

of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu).

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

(\boldsymbol\mu,\boldsymbol\Sigma|y)\simNIW(\boldsymbol\mun,λn,\boldsymbol\Psin,\nun),

where

\boldsymbol\mun=

λ\boldsymbol\mu0+n\bar{\boldsymboly
}

λn=λ+n

\nun=\nu+n

\boldsymbol\Psin=\boldsymbol{\Psi+S}+

λn
λ+n

(\boldsymbol{\bar{y}-\mu0})(\boldsymbol{\bar{y}-\mu

T ~~~with~~\boldsymbol{S}=
0})
n
\sum
i=1

(\boldsymbol{yi-\bar{y}})(\boldsymbol{y

T
i-\bar{y}})
.

To sample from the joint posterior of

(\boldsymbol\mu,\boldsymbol\Sigma)

, one simply draws samples from

\boldsymbol\Sigma|\boldsymboly\siml{W}-1(\boldsymbol\Psin,\nun)

, then draw

\boldsymbol\mu|\boldsymbol{\Sigma,y}\siml{N}p(\boldsymbol\mun,\boldsymbol\Sigma/λn)

. To draw from the posterior predictive of a new observation, draw

\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y}\siml{N}p(\boldsymbol\mu,\boldsymbol\Sigma)

, given the already drawn values of

\boldsymbol\mu

and

\boldsymbol\Sigma

.[3]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

  1. Sample

\boldsymbol\Sigma

from an inverse Wishart distribution with parameters

\boldsymbol\Psi

and

\nu

  1. Sample

\boldsymbol\mu

from a multivariate normal distribution with mean

\boldsymbol\mu0

and variance

\boldsymbol\tfrac{1}{λ}\boldsymbol\Sigma

Related distributions

(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu)

then

(\boldsymbol\mu,\boldsymbol\Sigma-1)\sim

-1
NW(\boldsymbol\mu
0,λ,\boldsymbol\Psi

,\nu)

.

References

Notes and References

  1. Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." http://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf
  2. Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
  3. Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.