Multivariate random variable explained

In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of an unspecified person from within a group would be a random vector. Normally each element of a random vector is a real number.

Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc.

X=(X_1,...,X

	T

	n)

(or its transpose, which is a row vector) whose components are random variables on the probability space

(\Omega,l{F},P)

, where

\Omega

is the sample space,

l{F}

is the sigma-algebra (the collection of all events), and

is the probability measure (a function returning each event's probability).

Probability distribution

See main article: Multivariate probability distribution. Every random vector gives rise to a probability measure on

Rⁿ

with the Borel algebra as the underlying sigma-algebra. This measure is also known as the joint probability distribution, the joint distribution, or the multivariate distribution of the random vector.

The distributions of each of the component random variables

X_i

are called marginal distributions. The conditional probability distribution of

X_i

given

X_j

is the probability distribution of

X_i

when

X_j

is known to be a particular value.

The cumulative distribution function

F_X:\Rⁿ\mapsto[0,1]

of a random vector

X=(X_1,...,X

	T

	n)

is defined as^[1]

where

x=(x_1,...,

	T
x
	n)

Operations on random vectors

Random vectors can be subjected to the same kinds of algebraic operations as can non-random vectors: addition, subtraction, multiplication by a scalar, and the taking of inner products.

Affine transformations

Similarly, a new random vector

can be defined by applying an affine transformation

g\colonRⁿ\toRⁿ

to a random vector

Y=AX+b

, where

is an

n x n

matrix and

is an

n x 1

column vector.

is an invertible matrix and

styleX

has a probability density function

f_X

, then the probability density of

f_Y(y)=

	f_X(A^-1(y-b))
	\|\detA\|

Invertible mappings

More generally we can study invertible mappings of random vectors.^[2]

Let

be a one-to-one mapping from an open subset

l{D}

Rⁿ

onto a subset

l{R}

Rⁿ

, let

have continuous partial derivatives in

l{D}

and let the Jacobian determinant of

be zero at no point of

l{D}

. Assume that the real random vector

has a probability density function

f_X(x)

and satisfies

P(X\inl{D})=1

. Then the random vector

Y=g(X)

is of probability density

\left.f_Y(y)=

f_X(x)

\left

\|\det	\partialx
	\partialy

\right|

\right

\|
	x=g^-1(y)

1(y\inR_Y)

where

denotes the indicator function and set

R_Y=\{y=g(x):f_X(x)>0\}\subseteql{R}

denotes support of

Expected value

The expected value or mean of a random vector

is a fixed vector

\operatorname{E}[X]

whose elements are the expected values of the respective random variables.^[3]

Covariance and cross-covariance

Definitions

The covariance matrix (also called second central moment or variance-covariance matrix) of an

n x 1

random vector is an

n x n

matrix whose (i,j)^th element is the covariance between the i^th and the j^th random variables. The covariance matrix is the expected value, element by element, of the

n x n

matrix computed as

[X-\operatorname{E}[X]][X-\operatorname{E}[X]]^T

, where the superscript T refers to the transpose of the indicated vector:^[2] ^[3]

By extension, the cross-covariance matrix between two random vectors

and

(

having

elements and

having

elements) is the

n x p

matrix^[3]

where again the matrix expectation is taken element-by-element in the matrix. Here the (i,j)^th element is the covariance between the i^th element of

and the j^th element of

Properties

The covariance matrix is a symmetric matrix, i.e.^[2]

	T
\operatorname{K}
	XX

=\operatorname{K}_XX

The covariance matrix is a positive semidefinite matrix, i.e.^[2]

a^T\operatorname{K}_XXa\ge0 foralla\inRⁿ

The cross-covariance matrix

\operatorname{Cov}[Y,X]

is simply the transpose of the matrix

\operatorname{Cov}[X,Y]

, i.e.

\operatorname{K}_YX=

	T
\operatorname{K}
	XY

Uncorrelatedness

Two random vectors

X=(X_1,...,X

	T

	m)

and

Y=(Y_1,...,Y

	T

	n)

are called uncorrelated if

\operatorname{E}[XY^T]=\operatorname{E}[X]\operatorname{E}[Y]^T

They are uncorrelated if and only if their cross-covariance matrix

\operatorname{K}_XY

is zero.^[3]

Correlation and cross-correlation

Definitions

The correlation matrix (also called second moment) of an

n x 1

random vector is an

n x n

matrix whose (i,j)^th element is the correlation between the i^th and the j^th random variables. The correlation matrix is the expected value, element by element, of the

n x n

matrix computed as

XX^T

, where the superscript T refers to the transpose of the indicated vector:^[4] ^[3]

By extension, the cross-correlation matrix between two random vectors

and

(

having

elements and

having

elements) is the

n x p

matrix

Properties

The correlation matrix is related to the covariance matrix by

\operatorname{R}_XX=\operatorname{K}_XX+\operatorname{E}[X]\operatorname{E}[X]^T

.Similarly for the cross-correlation matrix and the cross-covariance matrix:

\operatorname{R}_XY=\operatorname{K}_XY+\operatorname{E}[X]\operatorname{E}[Y]^T

Orthogonality

Two random vectors of the same size

X=(X_1,...,X

	T

	n)

and

Y=(Y_1,...,Y

	T

	n)

are called orthogonal if

\operatorname{E}[X^TY]=0

Independence

See main article: Independence (probability theory). Two random vectors

and

are called independent if for all

and

F_X,Y(x,y)=F_X(x) ⋅ F_Y(y)

where

F_X(x)

and

F_Y(y)

denote the cumulative distribution functions of

and

F_X,Y(x,y)

denotes their joint cumulative distribution function. Independence of

and

is often denoted by

X\perp\perpY

.Written component-wise,

and

are called independent if for all

x_1,\ldots,x_m,y_1,\ldots,y_n

F
	X_1,\ldots,X_m,Y_1,\ldots,Y_n

(x_1,\ldots,x_m,y_1,\ldots,y_n)=

F
	X_1,\ldots,X_m

(x_1,\ldots,x_m) ⋅

F
	Y_1,\ldots,Y_n

(y_1,\ldots,y_n)

Characteristic function

The characteristic function of a random vector

with

components is a function

Rⁿ\toC

that maps every vector

\omega=(\omega_{1,\ldots,\omega}

	T

	n)

to a complex number. It is defined by^[2]

\varphi_X(\omega)=\operatorname{E}\left[

	i(\omega^TX)
e

\right]=\operatorname{E}\left[

	i(\omega₁X₁+\ldots+\omega_nX_n)
e

\right]

Further properties

Expectation of a quadratic form

One can take the expectation of a quadratic form in the random vector

as follows:^[5]

\operatorname{E}[X^TAX]=\operatorname{E}[X]^TA\operatorname{E}[X]+\operatorname{tr}(AK_XX),

where

K_XX

is the covariance matrix of

and

\operatorname{tr}

refers to the trace of a matrix — that is, to the sum of the elements on its main diagonal (from upper left to lower right). Since the quadratic form is a scalar, so is its expectation.

Proof: Let

be an

m x 1

random vector with

\operatorname{E}[z]=\mu

and

\operatorname{Cov}[z]=V

and let

be an

m x m

non-stochastic matrix.

Then based on the formula for the covariance, if we denote

z^T=X

and

z^TA^T=Y

, we see that:

\operatorname{Cov}[X,Y]=\operatorname{E}[XY^{T]-\operatorname{E}[X]\operatorname{E}[Y]}^T

Hence

\begin{align} \operatorname{E}[XY^T]&=\operatorname{Cov}[X,Y]+\operatorname{E}[X]\operatorname{E}[Y]^T\\ \operatorname{E}[z^TAz]&=\operatorname{Cov}[z^T,z^TA^T]+\operatorname{E}[z^{T]\operatorname{E}[z}^TA^T]^T\\ &=\operatorname{Cov}[z^T,z^TA^T]+\mu^T(\mu^TA^T)^T\\ &=\operatorname{Cov}[z^T,z^TA^T]+\mu^TA\mu, \end{align}

which leaves us to show that

\operatorname{Cov}[z^T,z^TA^T]=\operatorname{tr}(AV).

This is true based on the fact that one can cyclically permute matrices when taking a trace without changing the end result (e.g.:

\operatorname{tr}(AB)=\operatorname{tr}(BA)

We see that

\begin{align} \operatorname{Cov}[z^T,z^TA^T]&=\operatorname{E}\left[\left(z^T-E(z^T)\right)\left(z^TA^T-E\left(z^TA^T\right)\right)^T\right]\\ &=\operatorname{E}\left[(z^T-\mu^T)(z^TA^T-\mu^TA^T)^T\right]\\ &=\operatorname{E}\left[(z-\mu)^T(Az-A\mu)\right]. \end{align}

And since

\left({z-\mu}\right)^T\left({Az-A\mu}\right)

is a scalar, then

(z-\mu)^T(Az-A\mu)=\operatorname{tr}\left({(z-\mu)^T(Az-A\mu)}\right)=\operatorname{tr}\left((z-\mu)^TA(z-\mu)\right)

trivially. Using the permutation we get:

\operatorname{tr}\left({(z-\mu)^TA(z-\mu)}\right)=\operatorname{tr}\left({A(z-\mu)(z-\mu)^T}\right),

and by plugging this into the original formula we get:

\begin{align} \operatorname{Cov}\left[{z^T,z^TA^T}\right]&=E\left[{\left({z-\mu}\right)^T(Az-A\mu)}\right]\\ &=E\left[\operatorname{tr}\left(A(z-\mu)(z-\mu)^T\right)\right]\\ &=\operatorname{tr}\left({A ⋅ \operatorname{E}\left((z-\mu)(z-\mu)^T\right)}\right)\\ &=\operatorname{tr}(AV). \end{align}

Expectation of the product of two different quadratic forms

One can take the expectation of the product of two different quadratic forms in a zero-mean Gaussian random vector

as follows:^[5]

\operatorname{E}\left[(X^TAX)(X^TBX)\right]=2\operatorname{tr}(AK_XXBK_XX)+\operatorname{tr}(AK_XX)\operatorname{tr}(BK_XX)

where again

K_XX

is the covariance matrix of

. Again, since both quadratic forms are scalars and hence their product is a scalar, the expectation of their product is also a scalar.

Applications

Portfolio theory

In portfolio theory in finance, an objective often is to choose a portfolio of risky assets such that the distribution of the random portfolio return has desirable properties. For example, one might want to choose the portfolio return having the lowest variance for a given expected value. Here the random vector is the vector

of random returns on the individual assets, and the portfolio return p (a random scalar) is the inner product of the vector of random returns with a vector w of portfolio weights — the fractions of the portfolio placed in the respective assets. Since p = w^T

, the expected value of the portfolio return is w^TE(

) and the variance of the portfolio return can be shown to be w^TCw, where C is the covariance matrix of

Regression theory

In linear regression theory, we have data on n observations on a dependent variable y and n observations on each of k independent variables x_j. The observations on the dependent variable are stacked into a column vector y; the observations on each independent variable are also stacked into column vectors, and these latter column vectors are combined into a design matrix X (not denoting a random vector in this context) of observations on the independent variables. Then the following regression equation is postulated as a description of the process that generated the data:

y=X\beta+e,

where β is a postulated fixed but unknown vector of k response coefficients, and e is an unknown random vector reflecting random influences on the dependent variable. By some chosen technique such as ordinary least squares, a vector

\hat\beta

is chosen as an estimate of β, and the estimate of the vector e, denoted

\hate

, is computed as

\hate=y-X\hat\beta.

Then the statistician must analyze the properties of

\hat\beta

and

\hate

, which are viewed as random vectors since a randomly different selection of n cases to observe would have resulted in different values for them.

Vector time series

The evolution of a k×1 random vector

through time can be modelled as a vector autoregression (VAR) as follows:

X_t=c+A₁X_t-1+A₂X_t-2+ … +A_pX_t-p+e_t,

where the i-periods-back vector observation

X_t-i

is called the i-th lag of

, c is a k × 1 vector of constants (intercepts), A_i is a time-invariant k × k matrix and

e_t

is a k × 1 random vector of error terms.

Notes and References

Book: Gallager, Robert G. . 2013 . Stochastic Processes Theory for Applications . Cambridge University Press . 978-1-107-03975-9.
Book: Lapidoth, Amos . A Foundation in Digital Communication . Cambridge University Press . 2009 . 978-0-521-19395-5 .
Book: Gubner, John A. . 2006 . Probability and Random Processes for Electrical and Computer Engineers . Cambridge University Press . 978-0-521-86470-1.
Book: Papoulis, Athanasius . Probability, Random Variables and Stochastic Processes . McGraw-Hill . Third . 1991 . 0-07-048477-5 .
Book: Kendrick, David . Stochastic Control for Economic Models . McGraw-Hill . 1981 . 0-07-033962-7 .

Multivariate random variable explained

Probability distribution

Operations on random vectors

Affine transformations

Invertible mappings

Expected value

Covariance and cross-covariance

Definitions

Properties

Uncorrelatedness

Correlation and cross-correlation

Definitions

Properties

Orthogonality

Independence

Characteristic function

Further properties

Expectation of a quadratic form

Expectation of the product of two different quadratic forms

Applications

Portfolio theory

Regression theory

Vector time series

Further reading

Notes and References