In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and t is the standard logistic function, then X = t(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log (X/(1-X)) is normally distributed. It is also known as the logistic normal distribution,[1] which often refers to a multinomial logit version (e.g.[2] [3] [4]).
A variable might be modeled as logit-normal if it is a proportion, which is bounded by zero and one, and where values of zero and one never occur.
The probability density function (PDF) of a logit-normal distribution, for 0 < x < 1, is:
fX(x;\mu,\sigma)=
1 | |
\sigma\sqrt{2\pi |
where μ and σ are the mean and standard deviation of the variable’s logit (by definition, the variable’s logit is normally distributed).
The density obtained by changing the sign of μ is symmetrical, in that it is equal to f(1-x;-μ,σ), shifting the mode to the other side of 0.5 (the midpoint of the (0,1) interval).
The moments of the logit-normal distribution have no analytic solution. The moments can be estimated by numerical integration, however numerical integration can be prohibitive when the values of are such that the density function diverges to infinity at the end points zero and one. An alternative is to use the observation that the logit-normal is a transformation of a normal random variable. This allows us to approximate the
n
E[Xn] ≈
1 | |
K-1 |
K-1 | |
\sum | |
i=1 |
\left(
-1 | |
P\left(\Phi | |
\mu,\sigma2 |
(i/K)\right)\right)n,
where is the standard logistic function, and is the inverse cumulative distribution function of a normal distribution with mean and variance .
When the derivative of the density equals 0 then the location of the mode x satisfies the following equation:
\operatorname{logit}(x)=\sigma2(2x-1)+\mu.
The logistic normal distribution is a generalization of the logit–normal distribution to D-dimensional probability vectors by taking a logistic transformation of a multivariate normal distribution.[5] [6] [7]
The probability density function is:
fX(x;\boldsymbol{\mu},\boldsymbol{\Sigma})=
1 | |
|2\pi\boldsymbol{\Sigma |
| ||||
| |
}
1 | ||||||||
|
| |||||||||
e |
\right\}\top\boldsymbol{\Sigma}-1\left\{log\left(
x-D | |
xD |
\right)-\boldsymbol{\mu}\right\}} , x\inl{S}D ,
where
x-D
x
l{S}D
y\siml{N}\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right) , y\inRD-1
x=\left[
| ||||||||||||
|
,...,
| ||||||||||||
|
,
1 | ||||||||||||
|
\right]\top
The unique inverse mapping is given by:
y=\left[log\left(
x1 | |
xD |
\right),...,log\left(
xD-1 | |
xD |
\right)\right]\top
This is the case of a vector x which components sum up to one. In the case of x with sigmoidal elements, that is, when
y=\left[log\left(
x1 | |
1-x1 |
\right),...,log\left(
xD | |
1-xD |
\right)\right]\top
fX(x;\boldsymbol{\mu},\boldsymbol{\Sigma})=
1 | |
|2\pi\boldsymbol{\Sigma |
| ||||
| |
}
1 | ||||||||
|
| |||||||||
e |
\right\}\top\boldsymbol{\Sigma}-1\left\{log\left(
x | |
1-x |
\right)-\boldsymbol{\mu}\right\}}
1 | |
xi(1-xi) |
The logistic normal distribution is a more flexible alternative to the Dirichlet distribution in that it can capture correlations between components of probability vectors. It also has the potential to simplify statistical analyses of compositional data by allowing one to answer questions about log-ratios of the components of the data vectors. One is often interested in ratios rather than absolute component values.
The probability simplex is a bounded space, making standard techniques that are typically applied to vectors in
Rn
l{S}D
RD-1
The Dirichlet and logistic normal distributions are never exactly equal for any choice of parameters. However, Aitchison described a method for approximating a Dirichlet with a logistic normal such that their Kullback–Leibler divergence (KL) is minimized:
K(p,q)=
D} | |
\int | |
l{S |
p\left(x\mid\boldsymbol{\alpha}\right)log\left(
p\left(x\mid\boldsymbol{\alpha | |
\right) |
}{q\left(x\mid\boldsymbol{\mu},\boldsymbol{\Sigma}\right)}\right)dx
This is minimized by:
\boldsymbol{\mu}*=Ep\left[log\left(
x-D | |
xD |
\right)\right] , \boldsymbol{\Sigma}*=bf{Var}p\left[log\left(
x-D | |
xD |
\right)\right]
\psi
\psi'
* | |
\mu | |
i |
=\psi\left(\alphai\right)-\psi\left(\alphaD\right) , i=1,\ldots,D-1
* | |
\Sigma | |
ii |
=\psi'\left(\alphai\right)+\psi'\left(\alphaD\right) , i=1,\ldots,D-1
* | |
\Sigma | |
ij |
=\psi'\left(\alphaD\right) , i ≠ j
This approximation is particularly accurate for large
\boldsymbol{\alpha}
\alphai → infty,i=1,\ldots,D
p\left(x\mid\boldsymbol{\alpha}\right) → q\left(x\mid\boldsymbol{\mu}*,\boldsymbol{\Sigma}*\right)