In statistics, the multivariate t-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.
One common method of construction of a multivariate t-distribution, for the case of
p
y
u
N({0},{\boldsymbol\Sigma})
2 | |
\chi | |
\nu |
\Sigma
{\boldsymbol\mu}
\Gamma\left[(\nu+p)/2\right] | |
\Gamma(\nu/2)\nup/2\pip/2\left|{\boldsymbol\Sigma |
\right|1/2
and is said to be distributed as a multivariate t-distribution with parameters
{\boldsymbol\Sigma},{\boldsymbol\mu},\nu
\Sigma
\nu/(\nu-2)\Sigma
\nu>2
The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:
u\sim
2 | |
\chi | |
\nu |
y\simN(0,\boldsymbol{\Sigma})
x\gets\sqrt{\nu/u}y+\boldsymbol{\mu}
u\simGa(\nu/2,\nu/2)
Ga(a,b)
xa-1e-bx
x\midu
N(\boldsymbol{\mu},u-1\boldsymbol{\Sigma})
In the special case
\nu=1
There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (
p=1
t=x-\mu
\Sigma=1
f(t)=
\Gamma[(\nu+1)/2] | |
\sqrt{\nu\pi |
\Gamma[\nu/2]}(1+t2/\nu)-(\nu+1)/2
p
ti
t2
ti
\nu
A=\boldsymbol\Sigma-1
f(t)=
\Gamma((\nu+p)/2)\left|A\right|1/2 | |
\sqrt{\nup\pip |
\Gamma(\nu/2)}
p,p | |
\left(1+\sum | |
i,j=1 |
Aijti
-(\nu+p)/2 | |
t | |
j/\nu\right) |
which is the standard but not the only choice.
An important special case is the standard bivariate t-distribution, p = 2:
f(t1,t2)=
\left|A\right|1/2 | |
2\pi |
2,2 | |
\left(1+\sum | |
i,j=1 |
Aijti
-(\nu+2)/2 | |
t | |
j/\nu\right) |
Note that
| ||||||
|
=
1 | |
2\pi |
Now, if
A
f(t1,t2)=
1 | |
2\pi |
2 | |
\left(1+(t | |
1 |
+
2)/\nu\right) | |
t | |
2 |
-(\nu+2)/2.
The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When
\Sigma
A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.
The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here
x
F(x)=P(X\leqx), rm{where} X\simt\nu(\boldsymbol\mu,\boldsymbol\Sigma).
F(x)
This was developed by Muirhead [5] and Cornish.[6] but later derived using the simpler chi-squared ratio representation above, by Roth and Ding.[7] Let vector
X
p1,p2
Xp=\begin{bmatrix} X1\\ X2\end{bmatrix}\simtp\left(\mup,\Sigmap,\nu\right)
where
p1+p2=p
\mup=\begin{bmatrix} \mu1\\ \mu2\end{bmatrix}
\Sigmap=\begin{bmatrix} \Sigma11&\Sigma12\\ \Sigma21&\Sigma22\end{bmatrix}
Roth and Ding find the conditional distribution
p(X1|X2)
X1|X2\sim
t | |
p1 |
\left(\mu1|2,
\nu+d2 | |
\nu+p2 |
\Sigma11|2,\nu+p2\right)
An equivalent expression in Kotz et. al. is somewhat less concise.
Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution
X1|X2\sim
t | |
p1 |
\left(\mu1|2,\Psi,\tilde{\nu}\right)
f(X1|X2)=
\Gamma\left[(\tilde\nu+p1)/2\right] | ||||||
|
\right|1/2
\tilde\nu=\nu+p2
\nu
p2
\mu1|2=\mu1+\Sigma12
-1 | |
\Sigma | |
22 |
\left(X2-\mu2\right)
x1
\Sigma11|2=\Sigma11-\Sigma12\Sigma22-1\Sigma21
\Sigma22in\Sigma
d2=(X2-
T | |
\mu | |
2) |
-1 | |
\Sigma | |
22 |
(X2-\mu2)
X2
\mu2
\Sigma22
\Psi=
\nu+d2 | |
\nu+p2 |
\Sigma11|2
The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's t copula.[8]
Constructed as an elliptical distribution,[9] take the simplest centralised case with spherical symmetry and no scaling,
\Sigma=\operatorname{I}
fX(X)=g(XTX)=
| ||||||
|
(1+\nu-1XTX)-(
where
X=(x1, … ,xp)Tisap-vector
\nu
X
\operatorname{E}\left(XXT\right)=
infty | |
\int | |
-infty |
…
infty | |
\int | |
-infty |
fX(x1,...,xp)XXTdx1...dxp=
\nu | |
\nu-2 |
\operatorname{I}
The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,[10] define radial measure
r2=R2=
XTX | |
p |
which is equivalent to the variance of\operatorname{E}[r2]=
infty \int -infty …
infty \int -infty fX(x1,...,xp)
XTX p dx1...dxp=
\nu \nu-2
p
X
r2=
XTX | |
p |
F
r2\simfF(p,\nu)=B(
p | |
2 |
,
\nu | |
2 |
)-1 (
p | |
\nu |
)
p/2-1 | |
r | |
2 |
(1+
p | |
\nu |
r2)-(p
having mean value
\operatorname{E}[r2]=
\nu | |
\nu-2 |
F
By a change of random variable to
y=
p | |
\nu |
r2=
XTX | |
\nu |
p
X
\operatorname{E}[y]=
infty | |
\int | |
-infty |
…
infty | |
\int | |
-infty |
fX(X)
XTX | |
\nu |
dx1...dxp=
p | |
\nu-2 |
\begin{align}fY(y|p,\nu)&=\left|
p | |
\nu |
\right|-1B(
p | |
2 |
,
\nu | |
2 |
)-1 (
p | |
\nu |
) (
p | |
\nu |
)y(1+y)-(p\ \\ &=B (
p | |
2 |
,
\nu | |
2 |
)-1y(1+y)-(\nu\end{align}
y\sim\beta'(y;
p | |
2 |
,
\nu | |
2 |
)
| ||||||
|
=
p | |
\nu-2 |
Given the Beta-prime distribution, the radial cumulative distribution function of
y
FY(y)\simI(
y | |
1+y |
;
p | |
2 |
,
\nu | |
2 |
)B(
p | |
2 |
,
\nu | |
2 |
)-1
where
I
\Sigma
In the scalar case,
p=1
t2=y2\sigma-1
The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at
R=(XTX)1/2
pX(X)\propto(1+\nu-1R2)-(\nu+p)/2
AR
\deltaR
R
\deltaP=pX(R)AR\deltaR
The enclosed
p
R
AR=
2\pip/2R | |
\Gamma(p/2) |
\deltaP
\deltaP=pX(R)
2\pip/2R | |
\Gamma(p/2) |
\deltaR
fR(R)=
| |||||||||
|
2\pip/2R | |
\Gamma(p/2) |
(1+
R2 | |
\nu |
)-(
fR(R)=
2 | ||||||||
|
(
R2 | |
\nu |
)(1+
R2 | |
\nu |
)-(
B(*,*)
Changing the radial variable to
y=R2/\nu
fY(y)=
1 | ||||||||
|
y(1+y)-(
To scale the radial variables without changing the radial shape function, define scale matrix
\Sigma=\alpha\operatorname{I}
\DeltaP
dx1...dxp
\DeltaP (fX(X|\alpha,p,\nu) )=
| ||||||
|
(1+
XTX | |
\alpha\nu |
)-( dx1...dxp
or, in terms of scalar radial variable
R
fR(R|\alpha,p,\nu)=
2 | ||||||||||||
|
(
R2 | |
\alpha\nu |
)(1+
R2 | |
\alpha\nu |
)-(
The moments of all the radial variables, with the spherical distribution assumption, can be derived from the Beta Prime distribution. If
Z\sim\beta'(a,b)
\operatorname{E}(Zm)={
B(a+m,b-m) | |
B(a,b) |
y=
p | |
\nu |
R2
\operatorname{E}(ym)={
| ||||||||
|
r2=\nuy
\operatorname{E}
m) | |
(r | |
2 |
=\num\operatorname{E}(ym)
\alpha\operatorname{I}
\operatorname{E}
m | |
(r | |
2 |
|\alpha)=\alpham\num\operatorname{E}(ym)
R
R=(\alpha\nuy)1/2
M=2m
\operatorname{E}(RM)=\operatorname{E}((\alpha\nuy)1/2)2=(\alpha\nu)M/2\operatorname{E}(yM/2)=(\alpha\nu)M/2{
| |||||||||
|
This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf:
fX(X)=
\Kappa | |
\left|\Sigma\right|1/2 |
\left(1+\nu-1XT\Sigma-1X\right)
\Kappa
\nu
\Theta\inRp
Y=\ThetaX
fY(Y)=
\Kappa | |
\left|\Sigma\right|1/2 |
\left(1+\nu-1YT\Theta-T\Sigma-1\Theta-1Y\right)\left|
\partialY | |
\partialX |
\right|-1
The matrix of partial derivatives is
\partialYi | |
\partialXj |
=\Thetai,j
\left|
\partialY | |
\partialX |
\right|=\left|\Theta\right|
fY(Y)=
\Kappa | |
\left|\Sigma\right|1/2\left|\Theta\right| |
\left(1+\nu-1YT\Theta-T\Sigma-1\Theta-1Y\right)
The denominator reduces to
\left|\Sigma\right|1/2\left|\Theta\right|=\left|\Sigma\right|1/2\left|\Theta\right|1/2\left|\ThetaT\right|1/2=\left|\Theta\Sigma\ThetaT\right|1/2
fY(Y)=
\Gamma\left[(\nu+p)/2\right] | |
\Gamma(\nu/2)(\nu\pi)\left|\Theta\Sigma\ThetaT\right|1/2 |
\left(1+\nu-1YT\left(\Theta\Sigma\ThetaT\right)-1Y\right)
which is a regular MV-t distribution.
In general if
X\simtp(\mu,\Sigma,\nu)
\Thetap
p
\ThetaX+c\simtp(\Theta\mu+c,\Theta\Sigma\ThetaT,\nu)
This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition
X\simt(p,\mu,\Sigma,\nu)
p1,p2
Xp=\begin{bmatrix} X1\\ X2\end{bmatrix}\simt\left(p1+p2,\mup,\Sigmap,\nu\right)
with
p1+p2=p
\mup=\begin{bmatrix} \mu1\\ \mu2\end{bmatrix}
\Sigmap=\begin{bmatrix} \Sigma11&\Sigma12\\ \Sigma21&\Sigma22\end{bmatrix}
then
X1\simt\left(p1,\mu1,\Sigma11,\nu\right)
X2\simt\left(p2,\mu2,\Sigma,\nu\right)
f(X1)=
\Gamma\left[(\nu+p1)/2\right] | ||||||
|
\right|1/2
f(X2)=
\Gamma\left[(\nu+p2)/2\right] | ||||||
|
\right|1/2
If a transformation is constructed in the form
\Theta | |
p1 x p |
=\begin{bmatrix} 1& … &0& … &0\\ 0&\ddots&0& … &0\\ 0& … &1& … &0\end{bmatrix}
then vector
Y=\ThetaX
X1
In the linear transform case, if
\Theta
\Theta\inRm,m<p
m
\left|\Theta\right|
\left|\Theta\Sigma\ThetaT\right|1/2
X\simt(p,\mu,\Sigma,\nu)
\Thetam
m
Y=\ThetaX+c\simt(m,\Theta\mu+c,\Theta\Sigma\ThetaT,\nu)
fY(Y)=
\Gamma\left[(\nu+m)/2\right] | \left[1+ | |
\Gamma(\nu/2)(\nu\pi)\left|\Theta\Sigma\ThetaT\right|1/2 |
1 | |
\nu |
(Y-c1)T(\Theta\Sigma\ThetaT)-1(Y-c1)\right]-(\nu, c1=\Theta\mu+c
In extremis, if m = 1 and
\Theta
t2=Y2/\sigma2
\nu
Z
\nu
\nu\uparrowinfty