V-statistic explained

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for "unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals

T(Fn)

of the empirical distribution function

(Fn)

are called statistical functionals.[1] Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

Examples of statistical functions

  1. The k-th central moment is the functional

    T(F)=\int(x-\mu)kdF(x)

    , where

    \mu=E[X]

    is the expected value of X. The associated statistical function is the sample k-th central moment,

    Tn=mk=T(Fn)=

    1n
    \sum
    n
    i=1

    (xi-\overlinex)k.

  2. The chi-squared goodness-of-fit statistic is a statistical function T(Fn), corresponding to the statistical functional

    T(F)=

    k
    \sum
    i=1
    (\intdF-
    2
    p
    i)
    Ai
    pi

    ,

    where Ai are the k cells and pi are the specified probabilities of the cells under the null hypothesis.

  3. The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional

    T(F)=\int(F(x)-

    2
    F
    0(x))

    w(x;F0)dF0(x),

    where w(xF0) is a specified weight function and F0 is a specified null distribution. If w is the identity function then T(Fn) is the well known Cramér–von-Mises goodness-of-fit statistic; if

    w(x;F0)=[F0(x)(1-F

    -1
    0(x))]
    then T(Fn) is the Anderson–Darling statistic.

Representation as a V-statistic

Suppose x1, ..., xn is a sample. In typical applications the statistical function has a representation as the V-statistic

Vmn=

1
nm
n
\sum
i1=1

n
\sum
im=1
h(x
i1

,

x
i2

,...,

x
im

),

where h is a symmetric kernel function. Serfling[2] discusses how to find the kernel in practice. Vmn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(xy), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x1, ..., xn, the corresponding V-statistic is defined

V2,n=

1
n2
n
\sum
i=1
n
\sum
j=1

h(xi,xj).

Example of a V-statistic

  1. An example of a degree-2 V-statistic is the second central moment m2.

    If h(x, y) = (x - y)2/2, the corresponding V-statistic is

    V2,n=

    1
    n2
    n
    \sum
    i=1
    n
    \sum
    j=1
    1
    2

    (xi-

    2
    x
    j)

    =

    1
    n
    n
    \sum
    i=1

    (xi-\barx)2,

    which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:

    s2= {n\choose2}-1\sumi

    1
    2

    (xi-

    2
    x=
    j)
    1
    n-1
    n
    \sum
    i=1

    (xi-\barx)2

    .

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics.[3] Let A(m) be the property defined by:

A(m):

  1. Var(h(X1, ..., Xk)) = 0 for k < m, and Var(h(X1, ..., Xk)) > 0 for k = m;
  2. nm/2Rmn tends to zero (in probability). (Rmn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(Fn) is asymptotically normal.

In the variance example (4), m2 is asymptotically normal with mean

\sigma2

and variance

(\mu4-\sigma4)/n

, where
4
\mu
4=E(X-E(X))
.

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and

2(X
E[h
1,X

2)]<infty,E|h(X1,X1)|<infty,

and

E[h(x,X1)]\equiv0

. Then nV2,n converges in distribution to a weighted sum of independent chi-squared variables:

nV2,n{\stackreld\longrightarrow}

infty
\sum
k=1

λk

2
Z
k,

where

Zk

are independent standard normal variables and

λk

are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.[4]

See also

References

Notes and References

  1. von Mises (1947), p. 309; Serfling (1980), p. 210.
  2. Serfling (1980, Section 6.5)
  3. Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)
  4. See Lee (1990, p. 160) for the kernel function.