In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.
Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc.
X=(X1,...,X
T | |
n) |
(\Omega,l{F},P)
\Omega
l{F}
P
See main article: Multivariate probability distribution. Every random vector gives rise to a probability measure on
Rn
The distributions of each of the component random variables
Xi
Xi
Xj
Xi
Xj
The cumulative distribution function
FX:\Rn\mapsto[0,1]
X=(X1,...,X
T | |
n) |
where
x=(x1,...,
T | |
x | |
n) |
Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.
Similarly, a new random vector
Y
g\colonRn\toRn
X
Y=AX+b
A
n x n
b
n x 1
If
A
styleX
fX
Y
fY(y)=
fX(A-1(y-b)) | |
|\detA| |
More generally we can study invertible mappings of random vectors.[2]
Let
g
l{D}
Rn
l{R}
Rn
g
l{D}
g
l{D}
X
fX(x)
P(X\inl{D})=1
Y=g(X)
\left.fY(y)=
fX(x) | |||||
|
\right
| | |
x=g-1(y) |
1(y\inRY)
where
1
RY=\{y=g(x):fX(x)>0\}\subseteql{R}
Y
The expected value or mean of a random vector
X
\operatorname{E}[X]
The covariance matrix (also called second central moment or variance-covariance matrix) of an
n x 1
n x n
n x n
[X-\operatorname{E}[X]][X-\operatorname{E}[X]]T
By extension, the cross-covariance matrix between two random vectors
X
Y
X
n
Y
p
n x p
where again the matrix expectation is taken element-by-element in the matrix. Here the (i,j)th element is the covariance between the i th element of
X
Y
The covariance matrix is a symmetric matrix, i.e.[2]
T | |
\operatorname{K} | |
XX |
=\operatorname{K}XX
The covariance matrix is a positive semidefinite matrix, i.e.[2]
aT\operatorname{K}XXa\ge0 foralla\inRn
The cross-covariance matrix
\operatorname{Cov}[Y,X]
\operatorname{Cov}[X,Y]
\operatorname{K}YX=
T | |
\operatorname{K} | |
XY |
Two random vectors
X=(X1,...,X
T | |
m) |
Y=(Y1,...,Y
T | |
n) |
\operatorname{E}[XYT]=\operatorname{E}[X]\operatorname{E}[Y]T
They are uncorrelated if and only if their cross-covariance matrix
\operatorname{K}XY
The correlation matrix (also called second moment) of an
n x 1
n x n
n x n
XXT
By extension, the cross-correlation matrix between two random vectors
X
Y
X
n
Y
p
n x p
The correlation matrix is related to the covariance matrix by
\operatorname{R}XX=\operatorname{K}XX+\operatorname{E}[X]\operatorname{E}[X]T
\operatorname{R}XY=\operatorname{K}XY+\operatorname{E}[X]\operatorname{E}[Y]T
Two random vectors of the same size
X=(X1,...,X
T | |
n) |
Y=(Y1,...,Y
T | |
n) |
\operatorname{E}[XTY]=0
See main article: Independence (probability theory). Two random vectors
X
Y
x
y
FX,Y(x,y)=FX(x) ⋅ FY(y)
FX(x)
FY(y)
X
Y
FX,Y(x,y)
X
Y
X\perp\perpY
X
Y
x1,\ldots,xm,y1,\ldots,yn
F | |
X1,\ldots,Xm,Y1,\ldots,Yn |
(x1,\ldots,xm,y1,\ldots,yn)=
F | |
X1,\ldots,Xm |
(x1,\ldots,xm) ⋅
F | |
Y1,\ldots,Yn |
(y1,\ldots,yn)
The characteristic function of a random vector
X
n
Rn\toC
\omega=(\omega1,\ldots,\omega
T | |
n) |
\varphiX(\omega)=\operatorname{E}\left[
i(\omegaTX) | |
e |
\right]=\operatorname{E}\left[
i(\omega1X1+\ldots+\omeganXn) | |
e |
\right]
One can take the expectation of a quadratic form in the random vector
X
\operatorname{E}[XTAX]=\operatorname{E}[X]TA\operatorname{E}[X]+\operatorname{tr}(AKXX),
where
KXX
X
\operatorname{tr}
Proof: Let
z
m x 1
\operatorname{E}[z]=\mu
\operatorname{Cov}[z]=V
A
m x m
Then based on the formula for the covariance, if we denote
zT=X
zTAT=Y
\operatorname{Cov}[X,Y]=\operatorname{E}[XYT]-\operatorname{E}[X]\operatorname{E}[Y]T
Hence
\begin{align} \operatorname{E}[XYT]&=\operatorname{Cov}[X,Y]+\operatorname{E}[X]\operatorname{E}[Y]T\\ \operatorname{E}[zTAz]&=\operatorname{Cov}[zT,zTAT]+\operatorname{E}[zT]\operatorname{E}[zTAT]T\\ &=\operatorname{Cov}[zT,zTAT]+\muT(\muTAT)T\\ &=\operatorname{Cov}[zT,zTAT]+\muTA\mu, \end{align}
which leaves us to show that
\operatorname{Cov}[zT,zTAT]=\operatorname{tr}(AV).
This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the end result (e.g.:
\operatorname{tr}(AB)=\operatorname{tr}(BA)
We see that
\begin{align} \operatorname{Cov}[zT,zTAT]&=\operatorname{E}\left[\left(zT-E(zT)\right)\left(zTAT-E\left(zTAT\right)\right)T\right]\\ &=\operatorname{E}\left[(zT-\muT)(zTAT-\muTAT)T\right]\\ &=\operatorname{E}\left[(z-\mu)T(Az-A\mu)\right]. \end{align}
And since
\left({z-\mu}\right)T\left({Az-A\mu}\right)
is a scalar, then
(z-\mu)T(Az-A\mu)=\operatorname{tr}\left({(z-\mu)T(Az-A\mu)}\right)=\operatorname{tr}\left((z-\mu)TA(z-\mu)\right)
trivially. Using the permutation we get:
\operatorname{tr}\left({(z-\mu)TA(z-\mu)}\right)=\operatorname{tr}\left({A(z-\mu)(z-\mu)T}\right),
and by plugging this into the original formula we get:
\begin{align} \operatorname{Cov}\left[{zT,zTAT}\right]&=E\left[{\left({z-\mu}\right)T(Az-A\mu)}\right]\\ &=E\left[\operatorname{tr}\left(A(z-\mu)(z-\mu)T\right)\right]\\ &=\operatorname{tr}\left({A ⋅ \operatorname{E}\left((z-\mu)(z-\mu)T\right)}\right)\\ &=\operatorname{tr}(AV). \end{align}
One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector
X
\operatorname{E}\left[(XTAX)(XTBX)\right]=2\operatorname{tr}(AKXXBKXX)+\operatorname{tr}(AKXX)\operatorname{tr}(BKXX)
where again
KXX
X
In portfolio theory in finance, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector
r
r
r
r
In linear regression theory, we have data on n observations on a dependent variable y and n observations on each of k independent variables xj. The observations on the dependent variable are stacked into a column vector y; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a design matrix X (not denoting a random vector in this context) of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data:
y=X\beta+e,
where β is a postulated fixed but unknown vector of k response coefficients, and e is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector
\hat\beta
\hate
\hate=y-X\hat\beta.
Then the statistician must analyze the properties of
\hat\beta
\hate
The evolution of a k×1 random vector
X
Xt=c+A1Xt-1+A2Xt-2+ … +ApXt-p+et,
where the i-periods-back vector observation
Xt-i
X
et