In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
The multivariate normal distribution of a k-dimensional random vector
X=(X1,\ldots,X
T | |
k) |
X \sim l{N}(\boldsymbol\mu,\boldsymbol\Sigma),
X \sim l{N}k(\boldsymbol\mu,\boldsymbol\Sigma),
\boldsymbol\mu=\operatorname{E}[X]=(\operatorname{E}[X1],\operatorname{E}[X2],\ldots,\operatorname{E}[Xk])T,
k x k
\Sigmai,j=\operatorname{E}[(Xi-\mui)(Xj-\muj)]=\operatorname{Cov}[Xi,Xj]
1\lei\lek
1\lej\lek
\boldsymbol{Q}=\boldsymbol\Sigma-1
X=(X1,\ldots,X
T | |
k) |
Xi
Xi\sim l{N}(0,1)
i=1\ldotsk
A real random vector
X=(X1,\ldots,X
T | |
k) |
k x \ell
\boldsymbol{A}
\boldsymbol{A}Z
X
Z
\ell
A real random vector
X=(X1,\ldots,X
T | |
k) |
\ell
Z
k
\mu
k x \ell
\boldsymbol{A}
X=\boldsymbol{A}Z+\mu
Formally:
Here the covariance matrix is
\boldsymbol\Sigma=\boldsymbol{A}\boldsymbol{A}T
In the degenerate case where the covariance matrix is singular, the corresponding distribution has no density; see the section below for details. This case arises frequently in statistics; for example, in the distribution of the vector of residuals in the ordinary least squares regression. The
Xi
\boldsymbol{A}
Z
The following definitions are equivalent to the definition given above. A random vector
X=(X1,\ldots,
T | |
X | |
k) |
Y=a1X1+ … +akXk
a\inRk
Y=aTX
\mu
k x k
\boldsymbol\Sigma
X
The spherical normal distribution can be characterised as the unique distribution where components are independent in any orthogonal coordinate system.[3] [4]
\boldsymbol\Sigma
where
{x}
|\boldsymbol\Sigma|\equiv\det\boldsymbol\Sigma
\boldsymbol\Sigma
\boldsymbol\Sigma
1 x 1
The circularly symmetric version of the complex normal distribution has a slightly different form.
Each iso-density locus - the locus of points in k-dimensional space each of which gives the same particular value of the density - is an ellipse or its higher-dimensional generalization; hence the multivariate normal is a special case of the elliptical distributions.
The quantity
\sqrt{({x}-{\boldsymbol\mu})T{\boldsymbol\Sigma}-1({x}-{\boldsymbol\mu})}
{x}
{\boldsymbol\mu}
({x}-{\boldsymbol\mu})T{\boldsymbol\Sigma}-1({x}-{\boldsymbol\mu})
k=1
In the 2-dimensional nonsingular case (
k=\operatorname{rank}\left(\Sigma\right)=2
[XY]\prime
\rho
X
Y
\sigmaX>0
\sigmaY>0
\boldsymbol\mu=\begin{pmatrix}\muX\ \muY\end{pmatrix}, \boldsymbol\Sigma=\begin{pmatrix}
2 | |
\sigma | |
X |
&\rho\sigmaX\sigmaY\\ \rho\sigmaX\sigmaY&
2 | |
\sigma | |
Y |
\end{pmatrix}.
X
Y
[XY]\prime
The bivariate iso-density loci plotted in the
x,y
\boldsymbol\Sigma
As the absolute value of the correlation parameter
\rho
y(x)=sgn(\rho)
\sigmaY | |
\sigmaX |
(x-\muX)+\muY.
This is because this expression, with
sgn(\rho)
\rho
Y
X
If the covariance matrix
\boldsymbol\Sigma
\operatorname{rank}(\boldsymbol\Sigma)
x
To talk about densities meaningfully in singular cases, then, we must select a different base measure. Using the disintegration theorem we can define a restriction of Lebesgue measure to the
\operatorname{rank}(\boldsymbol\Sigma)
Rk
\left\{\boldsymbol\mu+\boldsymbol{\Sigma1/2
f(x)=
| ||||||
\sqrt{\det\nolimits |
*(2\pi\boldsymbol\Sigma)}
where
\boldsymbol\Sigma+
\det\nolimits*
The notion of cumulative distribution function (cdf) in dimension 1 can be extended in two ways to the multidimensional case, based on rectangular and ellipsoidal regions.
The first way is to define the cdf
F(x)
X
X
x
F(x)=P(X\leqx), whereX\siml{N}(\boldsymbol\mu,\boldsymbol\Sigma).
Though there is no closed form for
F(x)
Another way is to define the cdf
F(r)
r
The interval for the multivariate normal distribution yields a region consisting of those vectors x satisfying
({x}-{\boldsymbol\mu})T{\boldsymbol\Sigma}-1({x}-{\boldsymbol\mu})\leq
2 | |
\chi | |
k(p). |
Here
{x}
k
{\boldsymbol\mu}
k
\boldsymbol\Sigma
2 | |
\chi | |
k(p) |
p
k
k=2,
The complementary cumulative distribution function (ccdf) or the tail distribution is defined as
\overlineF(x)=1-P\left(X\leqx\right)
X\siml{N}(\boldsymbol\mu,\boldsymbol\Sigma)
\overlineF(x)=P\left(cupi\{Xi\geqxi\}\right)=P\left(maxiYi\geq0\right), whereY\siml{N}\left(\boldsymbol\mu-x,\boldsymbol\Sigma\right).
The probability content of the multivariate normal in a quadratic domain defined by
q(\boldsymbol{x})=\boldsymbol{x}'
Q2 |
\boldsymbol{x}+\boldsymbol{q1}'\boldsymbol{x}+q0>0
Q2 |
\boldsymbol{q1}
q0
f(\boldsymbol{x})>0
f(\boldsymbol{x})
See main article: Isserlis' theorem. The kth-order moments of x are given by
\mu1,\ldots,N(x) l\stackrel{def
where
The kth-order central moments are as follows
where the sum is taken over all allocations of the set
\left\{1,\ldots,2λ\right\}
\begin{align} &\operatornameE[X1X2X3X4X5X6]\\[8pt] ={}&\operatornameE[X1X2]\operatornameE[X3X4]\operatornameE[X5X6]+\operatornameE[X1X2]\operatornameE[X3X5]\operatornameE[X4X6]+\operatornameE[X1X2]\operatornameE[X3X6]\operatornameE[X4X5]\\[4pt] &{}+\operatornameE[X1X3]\operatornameE[X2X4]\operatornameE[X5X6]+\operatornameE[X1X3]\operatornameE[X2X5]\operatornameE[X4X6]+\operatornameE[X1X3]\operatornameE[X2X6]\operatornameE[X4X5]\\[4pt] &{}+\operatornameE[X1X4]\operatornameE[X2X3]\operatornameE[X5X6]+\operatornameE[X1X4]\operatornameE[X2X5]\operatornameE[X3X6]+\operatornameE[X1X4]\operatornameE[X2X6]\operatornameE[X3X5]\\[4pt] &{}+\operatornameE[X1X5]\operatornameE[X2X3]\operatornameE[X4X6]+\operatornameE[X1X5]\operatornameE[X2X4]\operatornameE[X3X6]+\operatornameE[X1X5]\operatornameE[X2X6]\operatornameE[X3X4]\\[4pt] &{}+\operatornameE[X1X6]\operatornameE[X2X3]\operatornameE[X4X5]+\operatornameE[X1X6]\operatornameE[X2X4]\operatornameE[X3X5]+\operatornameE[X1X6]\operatornameE[X2X5]\operatornameE[X3X4]. \end{align}
This yields
\tfrac{(2λ-1)!}{2λ(λ-1)!}
The covariances are then determined by replacing the terms of the list
[1,\ldots,2λ]
\begin{align} \operatornameE\left[
4 | |
X | |
i |
\right]&=
2 | |
3\sigma | |
ii |
\\[4pt] \operatornameE\left[
3 | |
X | |
i |
Xj\right]&=3\sigmaii\sigmaij\\[4pt] \operatornameE\left[
2 | |
X | |
i |
2 | |
X | |
j |
\right]&=\sigmaii\sigmajj+2\sigma
2 | |
ij |
\\[4pt] \operatornameE\left[
2 | |
X | |
i |
XjXk\right]&=\sigmaii\sigmajk+2\sigmaij\sigmaik\\[4pt] \operatornameE\left[XiXjXkXn\right]&=\sigmaij\sigmakn+\sigmaik\sigmajn+\sigmain\sigmajk. \end{align}
where
\sigmaij
E\left[XiXjXkXn\right]
\operatornameE[
2 | |
X | |
i |
XkXn]
\sigmaii=
2 | |
\sigma | |
i |
A quadratic form of a normal vector
\boldsymbol{x}
q(\boldsymbol{x})=\boldsymbol{x}'
Q2 |
\boldsymbol{x}+\boldsymbol{q1}'\boldsymbol{x}+q0
Q2 |
\boldsymbol{q1}
q0
If
f(\boldsymbol{x})
If the mean and covariance matrix are known, the log likelihood of an observed vector
\boldsymbol{x}
lnL(\boldsymbol{x})=-
1 | |
2 |
\left[ln(|\boldsymbol\Sigma|)+(\boldsymbol{x}-\boldsymbol\mu)'\boldsymbol\Sigma-1(\boldsymbol{x}-\boldsymbol\mu)+kln(2\pi)\right]
The circularly symmetric version of the noncentral complex case, where
\boldsymbol{z}
lnL(\boldsymbol{z})=-ln(|\boldsymbol\Sigma|)-(\boldsymbol{z}-\boldsymbol\mu)\dagger\boldsymbol\Sigma-1(\boldsymbol{z}-\boldsymbol\mu)-kln(\pi)
i.e. with the conjugate transpose (indicated by
\dagger
'
A similar notation is used for multiple linear regression.[14]
Since the log likelihood of a normal vector is a quadratic form of the normal vector, it is distributed as a generalized chi-squared variable.
The differential entropy of the multivariate normal distribution is[15]
\begin{align} h\left(f\right)&=
infty | |
-\int | |
-infty |
infty | |
\int | |
-infty |
infty | |
… \int | |
-infty |
f(x)lnf(x)dx\\ &=
12 | |
ln{} |
\left|2\pie\boldsymbol\Sigma\right|=
k | |
2 |
+
k | |
2 |
ln{}2\pi+
1 | |
2 |
ln{}\left|\boldsymbol\Sigma\right|\\ \end{align}
where the bars denote the matrix determinant, is the dimensionality of the vector space, and the result has units of nats.
The Kullback–Leibler divergence from
l{N}1(\boldsymbol\mu1,\boldsymbol\Sigma1)
l{N}0(\boldsymbol\mu0,\boldsymbol\Sigma0)
DKL(l{N}0\parallell{N}1)={1\over2}\left\{\operatorname{tr}\left(
-1 | |
\boldsymbol\Sigma | |
1 |
\boldsymbol\Sigma0\right)+\left(\boldsymbol\mu1-
\rmT | |
\boldsymbol\mu | |
0\right) |
-1 | |
\boldsymbol\Sigma | |
1 |
(\boldsymbol\mu1-\boldsymbol\mu0)-k+ln{|\boldsymbol\Sigma1|\over|\boldsymbol\Sigma0|}\right\},
| ⋅ |
tr( ⋅ )
ln( ⋅ )
k
The logarithm must be taken to base e since the two terms following the logarithm are themselves base-e logarithms of expressions that are either factors of the density function or otherwise arise naturally. The equation therefore gives a result measured in nats. Dividing the entire expression above by loge 2 yields the divergence in bits.
When
\boldsymbol\mu1=\boldsymbol\mu0
DKL(l{N}0\parallell{N}1)={1\over2}\left\{\operatorname{tr}\left(
-1 | |
\boldsymbol\Sigma | |
1 |
\boldsymbol\Sigma0\right)-k+ln{|\boldsymbol\Sigma1|\over|\boldsymbol\Sigma0|}\right\}.
The mutual information of a distribution is a special case of the Kullback–Leibler divergence in which
P
Q
\boldsymbol\Sigma1
\boldsymbol\Sigma0
\boldsymbol\mu1=\boldsymbol\mu0
I(\boldsymbol{X})=-{1\over2}ln|\boldsymbol\rho0|,
where
\boldsymbol\rho0
\boldsymbol\Sigma0
In the bivariate case the expression for the mutual information is:
I(x;y)=-{1\over2}ln(1-\rho2).
If
X
Y
(X,Y)
\rho=0
See also: normally distributed and uncorrelated does not imply independent. The fact that two random variables
X
Y
(X,Y)
Y=X
|X|>c
Y=-X
|X|<c
c>0
In general, random variables may be uncorrelated but statistically dependent. But if a random vector has a multivariate normal distribution then any two or more of its components that are uncorrelated are independent. This implies that any two or more of its components that are pairwise independent are independent. But, as pointed out just above, it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent.
If N-dimensional x is partitioned as follows
x = \begin{bmatrix} x1\\ x2 \end{bmatrix} withsizes\begin{bmatrix}q x 1\ (N-q) x 1\end{bmatrix}
and accordingly μ and Σ are partitioned as follows
\boldsymbol\mu = \begin{bmatrix} \boldsymbol\mu1\\ \boldsymbol\mu2 \end{bmatrix} withsizes\begin{bmatrix}q x 1\ (N-q) x 1\end{bmatrix}
\boldsymbol\Sigma = \begin{bmatrix} \boldsymbol\Sigma11&\boldsymbol\Sigma12\\ \boldsymbol\Sigma21&\boldsymbol\Sigma22\end{bmatrix} withsizes\begin{bmatrix}q x q&q x (N-q)\ (N-q) x q&(N-q) x (N-q)\end{bmatrix}
then the distribution of x1 conditional on x2 = a is multivariate normal[18] where
\bar{\boldsymbol\mu} = \boldsymbol\mu1+\boldsymbol\Sigma12
-1 | |
\boldsymbol\Sigma | |
22 |
\left(a-\boldsymbol\mu2 \right)
and covariance matrix
\overline{\boldsymbol\Sigma} = \boldsymbol\Sigma11-\boldsymbol\Sigma12
-1 | |
\boldsymbol\Sigma | |
22 |
\boldsymbol\Sigma21.
Here
-1 | |
\boldsymbol\Sigma | |
22 |
\boldsymbol\Sigma22
\overline{\boldsymbol\Sigma}
Note that knowing that alters the variance, though the new variance does not depend on the specific value of a; perhaps more surprisingly, the mean is shifted by
\boldsymbol\Sigma12
-1 | |
\boldsymbol\Sigma | |
22 |
\left(a-\boldsymbol\mu2\right)
l{N}q\left(\boldsymbol\mu1,\boldsymbol\Sigma11\right)
An interesting fact derived in order to prove this result, is that the random vectors
x2
y1=x1-\boldsymbol\Sigma12
-1 | |
\boldsymbol\Sigma | |
22 |
x2
The matrix Σ12Σ22−1 is known as the matrix of regression coefficients.
In the bivariate case where x is partitioned into
X1
X2
X1
X2
X1\midX2=a
\sim l{N}\left(\mu | ||||
|
\rho(a-\mu2),
2\right) | |
(1-\rho | |
1 |
where
\rho
X1
X2
\begin{pmatrix} X1\\ X2 \end{pmatrix}\siml{N}\left(\begin{pmatrix} \mu1\\ \mu2 \end{pmatrix},\begin{pmatrix}
2 | |
\sigma | |
1 |
&\rho\sigma1\sigma2\\ \rho\sigma1\sigma2&
2 | |
\sigma | |
2 \end{pmatrix} |
\right)
The conditional expectation of X1 given X2 is:
\operatorname{E}(X1\midX2=x2)=\mu1+\rho
\sigma1 | |
\sigma2 |
(x2-\mu2)
Proof: the result is obtained by taking the expectation of the conditional distribution
X1\midX2
\begin{pmatrix} X1\\ X2 \end{pmatrix}\siml{N}\left(\begin{pmatrix} 0\\ 0 \end{pmatrix},\begin{pmatrix} 1&\rho\\ \rho&1 \end{pmatrix}\right)
The conditional expectation of X1 given X2 is
\operatorname{E}(X1\midX2=x2)=\rhox2
and the conditional variance is
\operatorname{var}(X1\midX2=x2)=1-\rho2;
thus the conditional variance does not depend on x2.
The conditional expectation of X1 given that X2 is smaller/bigger than z is:[21]
\operatorname{E}(X1\midX2<z)=-\rho{\varphi(z)\over\Phi(z)},
\operatorname{E}(X1\midX2>z)=\rho{\varphi(z)\over(1-\Phi(z))},
where the final ratio here is called the inverse Mills ratio.
Proof: the last two results are obtained using the result
\operatorname{E}(X1\midX2=x2)=\rhox2
\operatorname{E}(X1\midX2<z)=\rhoE(X2\midX2<z)
To obtain the marginal distribution over a subset of multivariate normal random variables, one only needs to drop the irrelevant variables (the variables that one wants to marginalize out) from the mean vector and the covariance matrix. The proof for this follows from the definitions of multivariate normal distributions and linear algebra.[22]
Example
Let be multivariate normal random variables with mean vector and covariance matrix Σ (standard parametrization for multivariate normal distributions). Then the joint distribution of is multivariate normal with mean vector and covariance matrix
\boldsymbol\Sigma'= \begin{bmatrix} \boldsymbol\Sigma11&\boldsymbol\Sigma13\\ \boldsymbol\Sigma31&\boldsymbol\Sigma33\end{bmatrix}
If is an affine transformation of
X \siml{N}(\boldsymbol\mu,\boldsymbol\Sigma),
M x 1
M x N
Y\siml{N}\left(c+B\boldsymbol\mu,B\boldsymbol\SigmaB\rm\right)
B = \begin{bmatrix} 1&0&0&0&0&\ldots&0\\ 0&1&0&0&0&\ldots&0\\ 0&0&0&1&0&\ldots&0 \end{bmatrix}
which extracts the desired elements directly.
Another corollary is that the distribution of, where b is a constant vector with the same number of elements as X and the dot indicates the dot product, is univariate Gaussian with
Z\siml{N}\left(b ⋅ \boldsymbol\mu,b\rm\boldsymbol\Sigmab\right)
B=\begin{bmatrix} b1&b2&\ldots&bn \end{bmatrix}=b\rm.
An affine transformation of X such as 2X is not the same as the sum of two independent realisations of X.
See also: Confidence region.
The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. affine transformations of hyperspheres) centered at the mean.[23] Hence the multivariate normal distribution is an example of the class of elliptical distributions. The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix
\boldsymbol\Sigma
If is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal matrix of the eigenvalues, then we have
X \siml{N}(\boldsymbol\mu,\boldsymbol\Sigma)\iffX \sim\boldsymbol\mu+U\boldsymbolΛ1/2l{N}(0,I)\iffX \sim\boldsymbol\mu+Ul{N}(0,\boldsymbolΛ).
Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on N(0, Λ), but inverting a column changes the sign of U's determinant. The distribution N(μ, Σ) is in effect N(0, I) scaled by Λ1/2, rotated by U and translated by μ.
Conversely, any choice of μ, full rank matrix U, and positive diagonal entries Λi yields a non-singular multivariate normal distribution. If any Λi is zero and U is square, the resulting covariance matrix UΛUT is singular. Geometrically this means that every contour ellipsoid is infinitely thin and has zero volume in n-dimensional space, as at least one of the principal axes has length of zero; this is the degenerate case.
"The radius around the true mean in a bivariate normal random variable, re-written in polar coordinates (radius and angle), follows a Hoyt distribution."[24]
In one dimension the probability of finding a sample of the normal distribution in the interval
\mu\pm\sigma
Dimensionality | Probability | |
---|---|---|
1 | 0.6827 | |
2 | 0.3935 | |
3 | 0.1987 | |
4 | 0.0902 | |
5 | 0.0374 | |
6 | 0.0144 | |
7 | 0.0052 | |
8 | 0.0018 | |
9 | 0.0006 | |
10 | 0.0002 |
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is straightforward.
In short, the probability density function (pdf) of a multivariate normal is
f(x)=
1 | |
\sqrt{(2\pi)k|\boldsymbol\Sigma| |
}\exp\left(-{1\over2}(x-\boldsymbol\mu)\rm\boldsymbol\Sigma-1({x}-\boldsymbol\mu)\right)
and the ML estimator of the covariance matrix from a sample of n observations is [26]
\widehat{\boldsymbol\Sigma}={1\over
n | |
n}\sum | |
i=1 |
({x}i-\overline{x})({x}
T | |
i-\overline{x}) |
which is simply the sample covariance matrix. This is a biased estimator whose expectation is
E\left[\widehat{\boldsymbol\Sigma}\right]=
n-1 | |
n |
\boldsymbol\Sigma.
An unbiased sample covariance is
\widehat{\boldsymbol\Sigma}=
1{n-1}\sum | |
i=1 |
n(xi-\overline{x
I
K x K
K x K
K x K
The Fisher information matrix for estimating the parameters of a multivariate normal distribution has a closed form expression. This can be used, for example, to compute the Cramér–Rao bound for parameter estimation in this setting. See Fisher information for more details.
l{W}-1
X=\{x1,...,xn\}\siml{N}(\boldsymbol\mu,\boldsymbol\Sigma)
p(\boldsymbol\mu,\boldsymbol\Sigma)=p(\boldsymbol\mu\mid\boldsymbol\Sigma) p(\boldsymbol\Sigma),
p(\boldsymbol\mu\mid\boldsymbol\Sigma)
-1 | |
\siml{N}(\boldsymbol\mu | |
0,m |
\boldsymbol\Sigma),
p(\boldsymbol\Sigma)\siml{W}-1(\boldsymbol\Psi,n0).
Then[26]
\begin{array}{rcl} p(\boldsymbol\mu\mid\boldsymbol\Sigma,X)&\sim&
| ||||
|
\boldsymbol\Sigma\right),\\ p(\boldsymbol\Sigma\midX)&\sim&l{W}-1\left(\boldsymbol\Psi+nS+
nm | |
n+m |
(\bar{x
\begin{align} \bar{x
Multivariate normality tests check a given set of data for similarity to the multivariate normal distribution. The null hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small p-value indicates non-normal data. Multivariate normality tests include the Cox–Small test[27] and Smith and Jain's adaptation[28] of the Friedman–Rafsky test created by Larry Rafsky and Jerome Friedman.[29]
Mardia's test is based on multivariate extensions of skewness and kurtosis measures. For a sample of k-dimensional vectors we compute
\begin{align} &\widehat{\boldsymbol\Sigma}={1\overn}
n | |
\sum | |
j=1 |
\left(xj-\bar{x
& A = \sum_^n \sum_^n \left[(\mathbf{x}_i - \bar{\mathbf{x}})^\mathrm{T}\;\widehat{\boldsymbol\Sigma}^{-1} (\mathbf{x}_j - \bar{\mathbf{x}}) \right]^3 \\
& B = \sqrt\left\ \endUnder the null hypothesis of multivariate normality, the statistic A will have approximately a chi-squared distribution with degrees of freedom, and B will be approximately standard normal N(0,1).
Mardia's kurtosis statistic is skewed and converges very slowly to the limiting normal distribution. For medium size samples
(50\len<400)
n<50
Mardia's tests are affine invariant but not consistent. For example, the multivariate skewness test is not consistent againstsymmetric non-normal alternatives.[32]
The BHEP test computes the norm of the difference between the empirical characteristic function and the theoretical characteristic function of the normal distribution. Calculation of the norm is performed in the L2(μ) space of square-integrable functions with respect to the Gaussian weighting function
\mu\beta(t)=(2\pi\beta2)-k/2
-|t|2/(2\beta2) | |
e |
\begin{align} T\beta&=
\int | |
Rk |
\left|{1\overn}
n | |
\sum | |
j=1 |
itT\widehat{\boldsymbol\Sigma | |
e |
-1/2(xj-\bar{x)}}-
-|t|2/2 | |
e |
\right|2 \boldsymbol\mu\beta(t)dt\\ &={1\overn2}
n | |
\sum | |
i,j=1 |
-{\beta2\over2 | |
e |
(xi-x
T\widehat{\boldsymbol\Sigma} | |
j) |
-1(xi-xj)}-
2 | |
n(1+\beta2)k/2 |
n | |
\sum | |
i=1 |
| ||||||
e |
)T\widehat{\boldsymbol\Sigma}-1(xi-\bar{x
A detailed survey of these and other test procedures is available.
Suppose that observations (which are vectors) are presumed to come from one of several multivariate normal distributions, with known means and covariances. Then any given observation can be assigned to the distribution from which it has the highest probability of arising. This classification procedure is called Gaussian discriminant analysis.The classification performance, i.e. probabilities of the different classification outcomes, and the overall classification error, can be computed by the numerical method of ray-tracing (Matlab code).
A widely used method for drawing (sampling) a random vector x from the N-dimensional multivariate normal distribution with mean vector μ and covariance matrix Σ works as follows: