Multivariate t-distribution explained

In statistics, the multivariate t-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Definition

One common method of construction of a multivariate t-distribution, for the case of

p

dimensions, is based on the observation that if

y

and

u

are independent and distributed as

N({0},{\boldsymbol\Sigma})

and
2
\chi
\nu
(i.e. multivariate normal and chi-squared distributions) respectively, the matrix

\Sigma

is a p × p matrix, and

{\boldsymbol\mu}

is a constant vector then the random variable =/\sqrt + has the density[1]
\Gamma\left[(\nu+p)/2\right]
\Gamma(\nu/2)\nup/2\pip/2\left|{\boldsymbol\Sigma

\right|1/2

}\left[1+\frac{1}{\nu}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right]^

and is said to be distributed as a multivariate t-distribution with parameters

{\boldsymbol\Sigma},{\boldsymbol\mu},\nu

. Note that

\Sigma

is not the covariance matrix since the covariance is given by

\nu/(\nu-2)\Sigma

(for

\nu>2

).

The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:

  1. Generate

u\sim

2
\chi
\nu
and

y\simN(0,\boldsymbol{\Sigma})

, independently.
  1. Compute

x\gets\sqrt{\nu/u}y+\boldsymbol{\mu}

.This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals:

u\simGa(\nu/2,\nu/2)

where

Ga(a,b)

indicates a gamma distribution with density proportional to

xa-1e-bx

, and

x\midu

conditionally follows

N(\boldsymbol{\mu},u-1\boldsymbol{\Sigma})

.

In the special case

\nu=1

, the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (

p=1

), with

t=x-\mu

and

\Sigma=1

, we have the probability density function

f(t)=

\Gamma[(\nu+1)/2]
\sqrt{\nu\pi

\Gamma[\nu/2]}(1+t2/\nu)-(\nu+1)/2

and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of

p

variables

ti

that replaces

t2

by a quadratic function of all the

ti

. It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom

\nu

. With

A=\boldsymbol\Sigma-1

, one has a simple choice of multivariate density function

f(t)=

\Gamma((\nu+p)/2)\left|A\right|1/2
\sqrt{\nup\pip

\Gamma(\nu/2)}

p,p
\left(1+\sum
i,j=1

Aijti

-(\nu+p)/2
t
j/\nu\right)

which is the standard but not the only choice.

An important special case is the standard bivariate t-distribution, p = 2:

f(t1,t2)=

\left|A\right|1/2
2\pi
2,2
\left(1+\sum
i,j=1

Aijti

-(\nu+2)/2
t
j/\nu\right)

Note that

\Gamma
\left(\nu+2
2
\right)
\pi\nu\Gamma
\left(\nu
2
\right)

=

1
2\pi
.

Now, if

A

is the identity matrix, the density is

f(t1,t2)=

1
2\pi
2
\left(1+(t
1

+

2)/\nu\right)
t
2

-(\nu+2)/2.

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When

\Sigma

is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.

A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.

Cumulative distribution function

The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here

x

is a real vector):

F(x)=P(X\leqx),rm{where}  X\simt\nu(\boldsymbol\mu,\boldsymbol\Sigma).

There is no simple formula for

F(x)

, but it can be approximated numerically via Monte Carlo integration.[2] [3] [4]

Conditional Distribution

This was developed by Muirhead [5] and Cornish.[6] but later derived using the simpler chi-squared ratio representation above, by Roth and Ding.[7] Let vector

X

follow a multivariate t distribution and partition into two subvectors of

p1,p2

elements:

Xp=\begin{bmatrix} X1\\ X2\end{bmatrix}\simtp\left(\mup,\Sigmap,\nu\right)

where

p1+p2=p

, the known mean vectors are

\mup=\begin{bmatrix} \mu1\\ \mu2\end{bmatrix}

and the scale matrix is

\Sigmap=\begin{bmatrix} \Sigma11&\Sigma12\\ \Sigma21&\Sigma22\end{bmatrix}

.

Roth and Ding find the conditional distribution

p(X1|X2)

to be a new t-distribution with modified parameters.

X1|X2\sim

t
p1

\left(\mu1|2,

\nu+d2
\nu+p2

\Sigma11|2,\nu+p2\right)

An equivalent expression in Kotz et. al. is somewhat less concise.

Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution

X1|X2\sim

t
p1

\left(\mu1|2,\Psi,\tilde{\nu}\right)

above then, using the parameters below, the explicit conditional distribution becomes

f(X1|X2)=

\Gamma\left[(\tilde\nu+p1)/2\right]
\Gamma(\tilde\nu/2)(\pi\tilde\nu
p1/2
)
\left|{\boldsymbol\Psi

\right|1/2

}\left[1+\frac{1}{\tilde \nu}(X_1 - \mu_{1|2})^T{\boldsymbol\Psi}^{-1}(X_1- \mu_{1|2})\right]^where

\tilde\nu=\nu+p2

Effective degrees of freedom,

\nu

is augmented by the number of disused variables

p2

.

\mu1|2=\mu1+\Sigma12

-1
\Sigma
22

\left(X2-\mu2\right)

is the conditional mean of

x1

\Sigma11|2=\Sigma11-\Sigma12\Sigma22-1\Sigma21

is the Schur complement of

\Sigma22in\Sigma

; the conditional covariance.

d2=(X2-

T
\mu
2)
-1
\Sigma
22

(X2-\mu2)

is the squared Mahalanobis distance of

X2

from

\mu2

with scale matrix

\Sigma22

\Psi=

\nu+d2
\nu+p2

\Sigma11|2

Copulas based on the multivariate t

The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's t copula.[8]

Elliptical Representation

Constructed as an elliptical distribution,[9] take the simplest centralised case with spherical symmetry and no scaling,

\Sigma=\operatorname{I}

, then the multivariate t-PDF takes the form

fX(X)=g(XTX)=

\Gamma (
1
2
(\nu+p) )
(\nu\pi)p/2\Gamma(
1
2
\nu)

(1+\nu-1XTX)-(

where

X=(x1,,xp)Tisap-vector

and

\nu

= degrees of freedom as defined in Muirhead section 1.5. The covariance of

X

is

\operatorname{E}\left(XXT\right)=

infty
\int
-infty

infty
\int
-infty

fX(x1,...,xp)XXTdx1...dxp=

\nu
\nu-2

\operatorname{I}

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,[10] define radial measure

r2=R2=

XTX
p

and, noting that the density is dependent only on r2, we get

\operatorname{E}[r2]=

infty
\int
-infty

infty
\int
-infty

fX(x1,...,xp)

XTX
p

dx1...dxp=

\nu
\nu-2

which is equivalent to the variance of

p

-element vector

X

treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

Radial Distribution

r2=

XTX
p
follows the Fisher-Snedecor or

F

distribution:

r2\simfF(p,\nu)=B(

p
2

,

\nu
2

)-1 (

p
\nu

)

p/2-1
r
2

(1+

p
\nu

r2)-(p

having mean value

\operatorname{E}[r2]=

\nu
\nu-2

.

F

-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.

By a change of random variable to

y=

p
\nu

r2=

XTX
\nu

in the equation above, retaining

p

-vector

X

, we have

\operatorname{E}[y]=

infty
\int
-infty

infty
\int
-infty

fX(X)

XTX
\nu

dx1...dxp=

p
\nu-2
and probability distribution

\begin{align}fY(y|p,\nu)&=\left|

p
\nu

\right|-1B(

p
2

,

\nu
2

)-1 (

p
\nu

) (

p
\nu

)y(1+y)-(p\\\ &=B (

p
2

,

\nu
2

)-1y(1+y)-(\nu\end{align}

y\sim\beta'(y;

p
2

,

\nu
2

)

having mean value
1
2
p
1
2
\nu-1

=

p
\nu-2
.

Cumulative Radial Distribution

Given the Beta-prime distribution, the radial cumulative distribution function of

y

is known:

FY(y)\simI(

y
1+y

;

p
2

,

\nu
2

)B(

p
2

,

\nu
2

)-1

where

I

is the incomplete Beta function and applies with a spherical

\Sigma

assumption.

In the scalar case,

p=1

, the distribution is equivalent to Student-t with the equivalence

t2=y2\sigma-1

, the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".

The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at

R=(XTX)1/2

with PDF

pX(X)\propto(1+\nu-1R2)-(\nu+p)/2

is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area

AR

and thickness

\deltaR

at

R

is

\deltaP=pX(R)AR\deltaR

.

The enclosed

p

-sphere of radius

R

has surface area

AR=

2\pip/2R
\Gamma(p/2)

. Substitution into

\deltaP

shows that the shell has element of probability

\deltaP=pX(R)

2\pip/2R
\Gamma(p/2)

\deltaR

which is equivalent to radial density function

fR(R)=

\Gamma (
1
2
(\nu+p) )
p/2
\nu\pip/2\Gamma(
1
2
\nu)
2\pip/2R
\Gamma(p/2)

(1+

R2
\nu

)-(

which further simplifies to

fR(R)=

2
\nu1/2B(
1
2
p,
1
2
\nu)

(

R2
\nu

)(1+

R2
\nu

)-(

where

B(*,*)

is the Beta function.

Changing the radial variable to

y=R2/\nu

returns the previous Beta Prime distribution

fY(y)=

1
B(
1
2
p,
1
2
\nu)

y(1+y)-(

To scale the radial variables without changing the radial shape function, define scale matrix

\Sigma=\alpha\operatorname{I}

, yielding a 3-parameter Cartesian density function, ie. the probability

\DeltaP

in volume element

dx1...dxp

is

\DeltaP (fX(X|\alpha,p,\nu) )=

\Gamma (
1
2
(\nu+p) )
(\nu\pi)p/2\alphap/2\Gamma(
1
2
\nu)

(1+

XTX
\alpha\nu

)-(dx1...dxp

or, in terms of scalar radial variable

R

,

fR(R|\alpha,p,\nu)=

2
1/2
\alpha\nu1/2B(
1
2
p,
1
2
\nu)

(

R2
\alpha\nu

)(1+

R2
\alpha\nu

)-(

Radial Moments

The moments of all the radial variables, with the spherical distribution assumption, can be derived from the Beta Prime distribution. If

Z\sim\beta'(a,b)

then

\operatorname{E}(Zm)={

B(a+m,b-m)
B(a,b)
} , a known result. Thus, for variable

y=

p
\nu

R2

we have

\operatorname{E}(ym)={

B(1p+m,
1
2
\nu-m)
2
B(
1
2
p
,1
2
\nu)
} = \frac, \; \nu/2 > m The moments of

r2=\nuy

are

\operatorname{E}

m)
(r
2

=\num\operatorname{E}(ym)

while introducing the scale matrix

\alpha\operatorname{I}

yields

\operatorname{E}

m
(r
2

|\alpha)=\alpham\num\operatorname{E}(ym)

Moments relating to radial variable

R

are found by setting

R=(\alpha\nuy)1/2

and

M=2m

whereupon

\operatorname{E}(RM)=\operatorname{E}((\alpha\nuy)1/2)2=(\alpha\nu)M/2\operatorname{E}(yM/2)=(\alpha\nu)M/2{

B
(1
2
(p+M),
1
2
(\nu-M) )
B(
1
2
p
,1
2
\nu)
}

Linear Combinations and Affine Transformation

Full Rank Transform

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf:

fX(X)=

\Kappa
\left|\Sigma\right|1/2

\left(1+\nu-1XT\Sigma-1X\right)

, where

\Kappa

is a constant and

\nu

is arbitrary but fixed, let

\Theta\inRp

be a full-rank matrix and form vector

Y=\ThetaX

. Then, by straightforward change of variables

fY(Y)=

\Kappa
\left|\Sigma\right|1/2

\left(1+\nu-1YT\Theta-T\Sigma-1\Theta-1Y\right)\left|

\partialY
\partialX

\right|-1

The matrix of partial derivatives is

\partialYi
\partialXj

=\Thetai,j

and the Jacobian becomes

\left|

\partialY
\partialX

\right|=\left|\Theta\right|

. Thus

fY(Y)=

\Kappa
\left|\Sigma\right|1/2\left|\Theta\right|

\left(1+\nu-1YT\Theta-T\Sigma-1\Theta-1Y\right)

The denominator reduces to

\left|\Sigma\right|1/2\left|\Theta\right|=\left|\Sigma\right|1/2\left|\Theta\right|1/2\left|\ThetaT\right|1/2=\left|\Theta\Sigma\ThetaT\right|1/2

In full:

fY(Y)=

\Gamma\left[(\nu+p)/2\right]
\Gamma(\nu/2)(\nu\pi)\left|\Theta\Sigma\ThetaT\right|1/2

\left(1+\nu-1YT\left(\Theta\Sigma\ThetaT\right)-1Y\right)

which is a regular MV-t distribution.

In general if

X\simtp(\mu,\Sigma,\nu)

and

\Thetap

has full rank

p

then

\ThetaX+c\simtp(\Theta\mu+c,\Theta\Sigma\ThetaT,\nu)

Marginal Distributions

This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition

X\simt(p,\mu,\Sigma,\nu)

into two subvectors of

p1,p2

elements:

Xp=\begin{bmatrix} X1\\ X2\end{bmatrix}\simt\left(p1+p2,\mup,\Sigmap,\nu\right)

with

p1+p2=p

, means

\mup=\begin{bmatrix} \mu1\\ \mu2\end{bmatrix}

, scale matrix

\Sigmap=\begin{bmatrix} \Sigma11&\Sigma12\\ \Sigma21&\Sigma22\end{bmatrix}

then

X1\simt\left(p1,\mu1,\Sigma11,\nu\right)

,

X2\simt\left(p2,\mu2,\Sigma,\nu\right)

such that

f(X1)=

\Gamma\left[(\nu+p1)/2\right]
\Gamma(\nu/2)(\nu
p1/2
\pi)
\left|{\boldsymbol\Sigma11

\right|1/2

}\left[1+\frac{1}{\nu}({\mathbf X_1}-{\boldsymbol\mu_1})^T{\boldsymbol\Sigma}_{11}^{-1}({\mathbf X_1}-{\boldsymbol\mu_1})\right]^

f(X2)=

\Gamma\left[(\nu+p2)/2\right]
\Gamma(\nu/2)(\nu
p2/2
\pi)
\left|{\boldsymbol\Sigma22

\right|1/2

}\left[1+\frac{1}{\nu}({\mathbf X_2} - {\boldsymbol\mu_2})^T{\boldsymbol\Sigma}_{22}^{-1}({\mathbf X_2}-{\boldsymbol\mu_2})\right]^

If a transformation is constructed in the form

\Theta
p1 x p

=\begin{bmatrix} 1&&0&&0\\ 0&\ddots&0&&0\\ 0&&1&&0\end{bmatrix}

then vector

Y=\ThetaX

, as discussed below, has the same distribution as the marginal distribution of

X1

.

Rank-Reducing Linear Transform

In the linear transform case, if

\Theta

is a rectangular matrix

\Theta\inRm,m<p

, of rank

m

the result is dimensionality reduction. Here, Jacobian

\left|\Theta\right|

is seemingly rectangular but the value

\left|\Theta\Sigma\ThetaT\right|1/2

in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken.[11] In general if

X\simt(p,\mu,\Sigma,\nu)

and

\Thetam

has full rank

m

then

Y=\ThetaX+c\simt(m,\Theta\mu+c,\Theta\Sigma\ThetaT,\nu)

fY(Y)=

\Gamma\left[(\nu+m)/2\right]\left[1+
\Gamma(\nu/2)(\nu\pi)\left|\Theta\Sigma\ThetaT\right|1/2
1
\nu

(Y-c1)T(\Theta\Sigma\ThetaT)-1(Y-c1)\right]-(\nu,c1=\Theta\mu+c

In extremis, if m = 1 and

\Theta

becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by

t2=Y2/\sigma2

with the same

\nu

degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.

Z

whose elements remain 'entangled' and are not statistically independent.

\nu

values: /\sqrt, \; \; /\sqrt will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem.[12]

Related concepts

See also

\nu\uparrowinfty

.

Literature

External links

Notes and References

  1. Web site: Roth . Michael . 17 April 2013 . On the Multivariate t Distribution . live . 1 June 2022 . Automatic Control group. Linköpin University, Sweden . 31 July 2022 . https://web.archive.org/web/20220731142649/http://users.isy.liu.se/en/rt/roth/student.pdf .
  2. Book: Botev . Z. . Chen . Y.-L. . 2022 . Botev. Zdravko. Keller. Alexander. Lemieux. Christiane. Tuffin. Bruno. Advances in Modeling and Simulation: Festschrift for Pierre L'Ecuyer . Springer. 65–87 . Chapter 4: Truncated Multivariate Student Computations via Exponential Tilting. . https://doi.org/10.1007/978-3-031-10193-9_4 . 978-3-031-10192-2.
  3. Efficient probability estimation and simulation of the truncated multivariate student-t distribution . Botev . Z. I. . L'Ecuyer . P. . 6 December 2015 . IEEE . 2015 Winter Simulation Conference (WSC) . 380–391 . Huntington Beach, CA, USA . 10.1109/WSC.2015.7408180 .
  4. Book: Genz, Alan. Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics . 2009. 195 . Springer. 10.1007/978-3-642-01689-9 . 978-3-642-01689-9. 2017-09-05. 2022-08-27. https://web.archive.org/web/20220827214814/https://link.springer.com/book/10.1007/978-3-642-01689-9. live.
  5. Book: Muirhead, Robb . Aspects of Multivariate Statistical Theory . Wiley . 1982 . 978-0-47 1-76985-9 . USA . 32–36 Theorem 1.5.4.
  6. Cornish . E A . 1954 . The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates. . Australian Journal of Physics . 7 . 531–542 . 10.1071/PH550193. free .
  7. Ding . Peng . 2016 . On the Conditional Distribution of the Multivariate t Distribution . The American Statistician . 70 . 3 . 293–295 . 1604.00561 . 10.1080/00031305.2016.1164756 . 55842994.
  8. Web site: Demarta . Stefano . McNeil . Alexander . 2004 . The t Copula and Related Copulas . Risknet.
  9. Book: Osiewalski . Jacek . Bayesian Analysis in Statistics and Econometrics . Posterior Moments of Scale Parameters in Elliptical Sampling Models . Steele . Mark . Wiley . 1996 . 0-471-11856-7 . 323–335.
  10. Kibria . K M G . Joarder . A H . Jan 2006 . A short review of multivariate t distribution . Journal of Statistical Research . 40 . 1 . 59–72. 10.1007/s42979-021-00503-0 . 232163198 .
  11. Book: Aitken, A C -. Determinants and Matrices . Oliver and Boyd . 1948 . 5th . Edinburgh . Chapter IV, section 36.
  12. Giron . Javier . del Castilo . Carmen . 2010 . The multivariate Behrens–Fisher distribution . Journal of Multivariate Analysis . 101 . 9 . 2091–2102 . 10.1016/j.jmva.2010.04.008 . free .
  13. Okhrin . Y . Schmid . W . 2006 . Distributional Properties of Portfolio Weights . Journal of Econometrics . 134 . 235-256.
  14. Bodnar . T . Dmytriv . S . Parolya . N . Schmid . W . 2019 . Tests for the Weights of the Global Minimum Variance Portfolio in a High-Dimensional Setting . IEEE Trans. on Signal Processing . 67 . 17 . 4479-4493.
  15. Bodnar . T . Okhrin . Y . 2008 . Properties of the Singular, Inverse and Generalized inverse Partitioned Wishart Distribution. . Journal of Multivariate Analysis . 99 . Eqn.20 . 2389-2405.