In probability theory and statistics, the generalized chi-squared distribution (or generalized chi-square distribution) is the distribution of a quadratic form of a multinormal variable (normal vector), or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.
The generalized chi-squared variable may be described in multiple ways. One is to write it as a weighted sum of independent noncentral chi-square variables
{{\chi}'}2
z
\tilde{\chi}(\boldsymbol{w},\boldsymbol{k},\boldsymbol{λ},s,m)=\sumiwi{{\chi}'}2(ki,λi)+sz+m.
Here the parameters are the weights
wi
ki
λi
s
m
wi
Since a non-central chi-squared variable is a sum of squares of normal variables with different means, the generalized chi-square variable is also defined as a sum of squares of independent normal variables, plus an independent normal variable: that is, a quadratic in normal variables.
Another equivalent way is to formulate it as a quadratic form of a normal vector
\boldsymbol{x}
\tilde{\chi}=q(\boldsymbol{x})=\boldsymbol{x}'
Q2 |
\boldsymbol{x}+\boldsymbol{q1}'\boldsymbol{x}+q0
Here
Q2 |
\boldsymbol{q1}
q0
\boldsymbol{\mu}
\Sigma
\boldsymbol{x}
For the most general case, a reduction towards a common standard form can be made by using a representation of the following form:[3]
X=(z+a)TA(z+a)+cTz=(x+b)TD(x+b)+dTx+e,
where D is a diagonal matrix and where x represents a vector of uncorrelated standard normal random variables.
A generalized chi-square variable or distribution can be parameterized in two ways. The first is in terms of the weights
wi
ki
λi
s
m
Q2 |
\boldsymbol{q1}
q0
\boldsymbol{\mu}
\Sigma
The parameters of the second expression (quadratic form of a normal vector) can also be calculated in terms of the parameters of the first expression (in terms of non-central chi-squares, a normal and a constant).
There exists Matlab code to convert from one set of parameters to another.
The probability density, cumulative distribution, and inverse cumulative distribution functions of a generalized chi-squared variable do not have simple closed-form expressions. But there exist several methods to compute them numerically: Ruben's method,[5] Imhof's method, IFFT method,[6] ray method, and ellipse approximation.
Numerical algorithms [7] and computer code (Fortran and C, Matlab, R, Python, Julia) have been published that implement some of these methods to compute the PDF, CDF, and inverse CDF, and to generate random numbers.
The following table shows the best methods to use to compute the CDF and PDF for the different parts of the generalized chi-square distribution in different cases:
\tilde{\chi} | part | best cdf/pdf method(s) |
---|---|---|
ellipse: wi s=0 | body | Ruben, Imhof, IFFT, ray |
finite tail | Ruben, ray (if λi=0 | |
infinite tail | Ruben, ray | |
not ellipse: wi s ≠ 0 | body | Imhof, IFFT, ray |
infinite tails | ray |
The generalized chi-squared is the distribution of statistical estimates in cases where the usual statistical theory does not hold, as in the examples below.
If a predictive model is fitted by least squares, but the residuals have either autocorrelation or heteroscedasticity, then alternative models can be compared (in model selection) by relating changes in the sum of squares to an asymptotically valid generalized chi-squared distribution.[8]
If
\boldsymbol{x}
\boldsymbol{x}
\boldsymbol{x}
In Gaussian discriminant analysis, samples from multinormal distributions are optimally separated by using a quadratic classifier, a boundary that is a quadratic function (e.g. the curve defined by setting the likelihood ratio between two Gaussians to 1). The classification error rates of different types (false positives and false negatives) are integrals of the normal distributions within the quadratic regions defined by this classifier. Since this is mathematically equivalent to integrating a quadratic form of a normal vector, the result is an integral of a generalized-chi-squared variable.
The following application arises in the context of Fourier analysis in signal processing, renewal theory in probability theory, and multi-antenna systems in wireless communication. The common factor of these areas is that the sum of exponentially distributed variables is of importance (or identically, the sum of squared magnitudes of circularly-symmetric centered complex Gaussian variables).
If
Zi
2 | |
\sigma | |
i |
\tilde{Q}=
k | |
\sum | |
i=1 |
2 | |
|Z | |
i| |
has a generalized chi-squared distribution of a particular form. The difference from the standard chi-squared distribution is that
Zi
2 | |
\mu=\sigma | |
i |
\tilde{Q}
\mu/2
2/\mu
\chi2(2k)
2 | |
\sigma | |
i |
\tilde{Q}
f(x;
2) | |
k,\sigma | |
k |
=
k | |
\sum | |
i=1 |
| ||||||||||||||||||||||||||||||
|
forx\geq0.
If there are sets of repeated variances among
2 | |
\sigma | |
i |
r=(r1,r2,...,rM)
rm
2 | |
\sigma | |
m. |
\chi2
\tilde{Q}=
M | |
\sum | |
m=1 |
2 | |
\sigma | |
m/2* |
Qm, Qm\sim
2(2r | |
\chi | |
m) |
.
The pdf of
\tilde{Q}
f(x;r,
2 | |
\sigma | |
1, |
...
2 | |
\sigma | |
M) |
=
M | |
\prod | |
m=1 |
1 | ||||||
|
M | |
\sum | |
k=1 |
rk | |
\sum | |
l=1 |
\Psik,l,r | |
(rk-l)! |
rk-l | |
(-x) |
| ||||||||||
e |
, forx\geq0,
where
\Psik,l,r=
rk-1 | |
(-1) |
\sum | |
i\in \Omegak,l |
\prodj\binom{ij+rj-1}{i
- | |||||||||||
|
1 | ||||||
|
-(rj+ij) | |
\right) |
,
with
i=[i1,\ldots,i
T | |
M] |
\Omegak,l
l-1
ik=0
\Omegak,l=\left\{[i1,\ldots,im]\inZm;
M | |
\sum | |
j=1 |
ij=l-1,ik=0,ij\geq0forallj\right\}.