V-statistic explained

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for "unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals

T(F_n)

of the empirical distribution function

(F_n)

are called statistical functionals.^[1] Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

Examples of statistical functions

The k-th central moment is the functional

T(F)=\int(x-\mu)^kdF(x)

, where
\mu=E[X]

is the expected value of X. The associated statistical function is the sample k-th central moment,
T_n=m_k=T(F_n)=

1n
\sum
n
i=1

(x_i-\overlinex)^k.

The chi-squared goodness-of-fit statistic is a statistical function T(F_n), corresponding to the statistical functional

T(F)=

k
\sum
i=1
(\int dF-
2
p
i)
A_i
p_i

,

where A_i are the k cells and p_i are the specified probabilities of the cells under the null hypothesis.

The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional

T(F)=\int(F(x)-

2
F
0(x))

w(x;F₀₎dF_0(x),

where w(x; F₀) is a specified weight function and F₀ is a specified null distribution. If w is the identity function then T(F_n) is the well known Cramér–von-Mises goodness-of-fit statistic; if
w(x;F_0)=[F_0(x)(1-F

-1
0(x))]

then T(F_n) is the Anderson–Darling statistic.

Representation as a V-statistic

Suppose x₁, ..., x_n is a sample. In typical applications the statistical function has a representation as the V-statistic

V_mn=

	1
	n^m

	n
\sum
	i₁₌₁

…

	n
\sum
	i_m=1

h(x
	i₁

x
	i₂

,...,

x
	i_m

where h is a symmetric kernel function. Serfling^[2] discusses how to find the kernel in practice. V_mn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x₁, ..., x_n, the corresponding V-statistic is defined

V_2,n=

	1
	n²

	n
\sum
	i=1

	n
\sum
	j=1

h(x_i,x_j).

Example of a V-statistic

An example of a degree-2 V-statistic is the second central moment m₂.

If h(x, y) = (x - y)²/2, the corresponding V-statistic is

V_2,n=

1
n²
n
\sum
i=1
n
\sum
j=1
1
2

(x_i-

2
x
j)

=

1
n
n
\sum
i=1

(x_i-\barx)^2,

which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:
s^2=
{n\choose2}^-1\sum_i

1
2

(x_i-

2
x =
j)
1
n-1
n
\sum
i=1

(x_i-\barx)²

.

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics.^[3] Let A(m) be the property defined by:

A(m):

Var(h(X₁, ..., X_k)) = 0 for k < m, and Var(h(X₁, ..., X_k)) > 0 for k = m;
n^m/2R_mn tends to zero (in probability). (R_mn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(F_n) is asymptotically normal.

In the variance example (4), m₂ is asymptotically normal with mean

\sigma²

and variance

(\mu₄-\sigma^4)/n

, where

	4
\mu
	4=E(X-E(X))

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and

	2(X
E[h
	1,X

_2)]<infty,E|h(X_1,X_1)|<infty,

and

E[h(x,X_1)]\equiv0

. Then nV_2,n converges in distribution to a weighted sum of independent chi-squared variables:

nV_2,n{\stackreld\longrightarrow}

	infty
\sum
	k=1

λ_k

	2
Z
	k,

where

Z_k

are independent standard normal variables and

λ_k

are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V_2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.^[4]

References

Hoeffding . W. . 1948 . A class of statistics with asymptotically normal distribution . Annals of Mathematical Statistics . 19 . 3 . 293–325 . 2235637 . 10.1214/aoms/1177730196 . free .
Book: Koroljuk . V.S. . Borovskich . Yu.V. . 1994 . Theory of U-statistics . English translation by P.V.Malyshev and D.V.Malyshev from the 1989 Ukrainian . Kluwer Academic Publishers . Dordrecht . 0-7923-2608-3 .
Book: Lee, A.J. . 1990 . U-Statistics: theory and practice . Marcel Dekker, Inc. . New York . 0-8247-8253-4 .
Neuhaus . G. . 1977 . Functional limit theorems for U-statistics in the degenerate case . Journal of Multivariate Analysis . 7 . 3 . 424–439 . 10.1016/0047-259X(77)90083-5 . free .
Rosenblatt . M. . 1952 . Limit theorems associated with variants of the von Mises statistic . Annals of Mathematical Statistics . 23 . 4 . 617–623 . 2236587 . 10.1214/aoms/1177729341 . free .
Book: Serfling, R.J. . 1980 . Approximation theorems of mathematical statistics . John Wiley & Sons . New York . 0-471-02403-1 .
Book: Taylor . R.L. . Daffer . P.Z. . Patterson . R.F. . 1985 . Limit theorems for sums of exchangeable random variables . Rowman and Allanheld . New Jersey .
von Mises . R. . 1947 . On the asymptotic distribution of differentiable statistical functions . Annals of Mathematical Statistics . 18 . 2 . 309–348 . 2235734 . 10.1214/aoms/1177730385 . free .

Notes and References

von Mises (1947), p. 309; Serfling (1980), p. 210.
Serfling (1980, Section 6.5)
Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)
See Lee (1990, p. 160) for the kernel function.

	2
F
	0(x))

	-1

	0(x))]

	1
	n-1