In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]
Suppose
\boldsymbol\mu|\boldsymbol\mu0,λ,\boldsymbol\Sigma\sim
l{N}\left(\boldsymbol\mu|\boldsymbol\mu | ||||
|
\boldsymbol\Sigma\right)
\boldsymbol\mu0
\tfrac{1}{λ}\boldsymbol\Sigma
\boldsymbol\Sigma|\boldsymbol\Psi,\nu\siml{W}-1(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)
(\boldsymbol\mu,\boldsymbol\Sigma)
(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu).
f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu)=
l{N}\left(\boldsymbol\mu|\boldsymbol\mu | ||||
|
\boldsymbol\Sigma\right)l{W}-1(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)
The full version of the PDF is as follows:[2]
f(\boldsymbol{\mu},\boldsymbol{\Sigma}|\boldsymbol{\mu}0,λ,\boldsymbol{\Psi},\nu) =
λD/2|\boldsymbol{\Psi | |
| |
\nu
| ||||
|\boldsymbol{\Sigma}| |
Here
\GammaD[ ⋅ ]
Tr(\boldsymbol{\Psi})
By construction, the marginal distribution over
\boldsymbol\Sigma
\boldsymbol\mu
\boldsymbol\Sigma
\boldsymbol\mu
Suppose the sampling density is a multivariate normal distribution
\boldsymbol{yi}|\boldsymbol\mu,\boldsymbol\Sigma\siml{N}p(\boldsymbol\mu,\boldsymbol\Sigma)
where
\boldsymbol{y}
n x p
\boldsymbol{yi}
p
i
With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly
(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu).
The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart
(\boldsymbol\mu,\boldsymbol\Sigma|y)\simNIW(\boldsymbol\mun,λn,\boldsymbol\Psin,\nun),
where
\boldsymbol\mun=
λ\boldsymbol\mu0+n\bar{\boldsymboly | |
λn=λ+n
\nun=\nu+n
\boldsymbol\Psin=\boldsymbol{\Psi+S}+
λn | |
λ+n |
(\boldsymbol{\bar{y}-\mu0})(\boldsymbol{\bar{y}-\mu
T ~~~with~~\boldsymbol{S}= | |
0}) |
n | |
\sum | |
i=1 |
(\boldsymbol{yi-\bar{y}})(\boldsymbol{y
T | |
i-\bar{y}}) |
To sample from the joint posterior of
(\boldsymbol\mu,\boldsymbol\Sigma)
\boldsymbol\Sigma|\boldsymboly\siml{W}-1(\boldsymbol\Psin,\nun)
\boldsymbol\mu|\boldsymbol{\Sigma,y}\siml{N}p(\boldsymbol\mun,\boldsymbol\Sigma/λn)
\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y}\siml{N}p(\boldsymbol\mu,\boldsymbol\Sigma)
\boldsymbol\mu
\boldsymbol\Sigma
Generation of random variates is straightforward:
\boldsymbol\Sigma
\boldsymbol\Psi
\nu
\boldsymbol\mu
\boldsymbol\mu0
\boldsymbol\tfrac{1}{λ}\boldsymbol\Sigma
(\boldsymbol\mu,\boldsymbol\Sigma)\simNIW(\boldsymbol\mu0,λ,\boldsymbol\Psi,\nu)
(\boldsymbol\mu,\boldsymbol\Sigma-1)\sim
-1 | |
NW(\boldsymbol\mu | |
0,λ,\boldsymbol\Psi |
,\nu)