In probability theory and statistics, the normal-gamma distribution (or Gaussian-gamma distribution) is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.[1]
For a pair of random variables, (X,T), suppose that the conditional distribution of X given T is given by
X\midT\simN(\mu,1/(λT)),
\mu
λT
1/(λT).
Suppose also that the marginal distribution of T is given by
T\mid\alpha,\beta\sim\operatorname{Gamma}(\alpha,\beta),
where this means that T has a gamma distribution. Here λ, α and β are parameters of the joint distribution.
Then (X,T) has a normal-gamma distribution, and this is denoted by
(X,T)\sim\operatorname{NormalGamma}(\mu,λ,\alpha,\beta).
The joint probability density function of (X,T) is
By construction, the marginal distribution of
\tau
x
\tau
x
(\nu,\mu,\sigma2)=(2\alpha,\mu,\beta/(λ\alpha))
\alpha-1/2,-\beta-λ\mu2/2,λ\mu,-λ/2
ln\tau,\tau,\taux,\taux2
The following moments can be easily computed using the moment generating function of the sufficient statistic:
\operatorname{E}(lnT)=\psi\left(\alpha\right)-ln\beta,
where
\psi\left(\alpha\right)
\begin{align} \operatorname{E}(T)&=
\alpha | |
\beta |
,\\[5pt] \operatorname{E}(TX)&=\mu
\alpha | |
\beta |
,\\[5pt] \operatorname{E}(TX2)&=
1 | |
λ |
+\mu2
\alpha | |
\beta |
. \end{align}
If
(X,T)\simNormalGamma(\mu,λ,\alpha,\beta),
b>0,(bX,bT)
{\rmNormalGamma}(b\mu,λ/b3,\alpha,\beta/b).
Assume that x is distributed according to a normal distribution with unknown mean
\mu
\tau
x\siml{N}(\mu,\tau-1)
\mu
\tau
(\mu,\tau)
(\mu,\tau)\simNormalGamma(\mu0,λ0,\alpha0,\beta0),
for which the density satisfies
\pi(\mu,\tau)\propto
| ||||||||
\tau |
\exp[-\beta0\tau]\exp\left[-
| |||||||||||||
2 |
\right].
Suppose
x1,\ldots,xn\mid\mu,\tau\sim\operatorname{{i.}{i.}{d.}}\operatornameN\left(\mu,\tau-1\right),
X=(x1,\ldots,xn)
\mu,\tau
\mu,\tau
\mu
1/\tau.
\mu
\tau
X
P(\tau,\mu\midX)\proptoL(X\mid\tau,\mu)\pi(\tau,\mu),
where
L
Since the data are i.i.d, the likelihood of the entire dataset is equal to the product of the likelihoods of the individual data samples:
L(X\mid\tau,\mu)=
n | |
\prod | |
i=1 |
L(xi\mid\tau,\mu).
This expression can be simplified as follows:
\begin{align} L(X\mid\tau,\mu)&\propto
n | |
\prod | |
i=1 |
\tau1/2\exp\left[
-\tau | |
2 |
2\right] | |
(x | |
i-\mu) |
\\[5pt] &\propto\taun/2\exp\left[
-\tau | |
2 |
2\right] | |
\sum | |
i-\mu) |
\\[5pt] &\propto\taun/2\exp\left[
-\tau | |
2 |
n(x | |
\sum | |
i-\bar{x} |
+\bar{x}-\mu)2\right]\\[5pt] &\propto\taun/2\exp\left[
-\tau | |
2 |
n | |
\sum | |
i=1 |
2 | |
\left((x | |
i-\bar{x}) |
+(\bar{x}-\mu)2\right)\right]\\[5pt] &\propto\taun/2\exp\left[
-\tau | |
2 |
\left(ns+n(\bar{x}-\mu)2\right)\right], \end{align}
where
\bar{x}=
1 | |
n |
n | |
\sum | |
i=1 |
xi
s=
1 | |
n |
2 | |
\sum | |
i-\bar{x}) |
The posterior distribution of the parameters is proportional to the prior times the likelihood.
\begin{align} P(\tau,\mu\midX)&\proptoL(X\mid\tau,\mu)\pi(\tau,\mu)\\ &\propto\taun/2\exp\left[
-\tau | |
2 |
\left(ns+n(\bar{x}-\mu)2\right)\right]
| ||||||||
\tau |
\exp[{-\beta | ||||||||||||||||
|
\right]\ &\propto
| ||||||||
\tau |
\exp\left[-\tau\left(
1 | |
2 |
ns+\beta0\right)\right]\exp\left[-
\tau | |
2 |
\left(λ0(\mu-\mu
2 | |
0) |
+n(\bar{x}-\mu)2\right)\right]\end{align}
The final exponential term is simplified by completing the square.
\begin{align} λ0(\mu-\mu
2 | |
0) |
+n(\bar{x}
2&=λ | |
-\mu) | |
0 |
\mu2-2λ0\mu\mu0+λ0
2 | |
\mu | |
0 |
+n\mu2-2n\bar{x}\mu+n\bar{x}2\\ &=(λ0+n)\mu2-2(λ0\mu0+n\bar{x})\mu+λ0
2 | |
\mu | |
0 |
+n\bar{x}2\\ &=(λ0+n)(\mu2-2
λ0\mu0+n\bar{x | |
On inserting this back into the expression above,
\begin{align} P(\tau,\mu\midX)&\propto
| ||||||||
\tau |
\exp\left[-\tau\left(
1 | |
2 |
ns+\beta0\right)\right]\exp\left[-
\tau | |
2 |
\left(\left(λ0+n\right)\left(\mu-
λ0\mu0+n\bar{x | |
This final expression is in exactly the same form as a Normal-Gamma distribution, i.e.,
P(\tau,\mu\midX)=NormalGamma\left(
λ0\mu0+n\bar{x | |
The interpretation of parameters in terms of pseudo-observations is as follows:
2\alpha
\mu
\beta | |
\alpha |
2\beta
λ0
n
As a consequence, if one has a prior mean of
\mu0
n\mu
\tau0
n\tau
\mu
\tau
P(\tau,\mu\midX)=\operatorname{NormalGamma}\left(\mu0,n\mu,
n\tau | |
2 |
,
n\tau | |
2\tau0 |
\right)
and after observing
n
\mu
s
P(\tau,\mu\midX)=NormalGamma\left(
n\mu\mu0+n\mu | |
n\mu+n |
,n\mu+n,
1 | |
2 |
(n\tau+n),
1 | \left( | |
2 |
n\tau | |
\tau0 |
+ns+
| |||||||||||||
n\mu+n |
\right)\right)
Note that in some programming languages, such as Matlab, the gamma distribution is implemented with the inverse definition of
\beta
2\tau0/n\tau
Generation of random variates is straightforward:
\tau
\alpha
\beta
x
\mu
1/(λ\tau)