Glivenko–Cantelli theorem explained
In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, describes the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows.[1] Specifically, the empirical distribution function converges uniformly to the true distribution function almost surely.
The uniform convergence of more general empirical measures becomes an important property of the Glivenko–Cantelli classes of functions or sets.[2] The Glivenko–Cantelli classes arise in Vapnik–Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.
Statement
Assume that
are
independent and identically distributed random variables in
with common
cumulative distribution function
. The
empirical distribution function for
is defined by
(x)=
l|\left\{ i \midXi\leqx, 1\leqi\leqn\right\}r|
where
is the
indicator function of the set
For every (fixed)
is a sequence of random variables which converge to
almost surely by the strong
law of large numbers. Glivenko and Cantelli strengthened this result by proving
uniform convergence of
to
Theorem
\|Fn-F\|infty=\supxl|Fn(x)-F(x)r|\longrightarrow0
almost surely.
[3] This theorem originates with Valery Glivenko[4] and Francesco Cantelli,[5] in 1933.
- Remarks:
is a stationary
ergodic process, then
converges almost surely to
F(x)=\operatornameEl[
r]~.
The Glivenko–Cantelli theorem gives a stronger mode of convergence than this in the
iid case.
- An even stronger uniform convergence result for the empirical distribution function is available in the form of an extended type of law of the iterated logarithm.[3] See asymptotic properties of the empirical distribution function for this and related results.
Proof
For simplicity, consider a case of continuous random variable
. Fix
-infty=x0<x1< … <xm-1<xm=infty
such that
for
. Now for all
there exists
such that
.
\begin{align}
Fn(x)-F(x)&\leqFn(xj)-F(xj-1)=Fn(xj)-F(x
&\geqFn(xj-1)-F(xj)=Fn(xj-1)-F(xj-1)-
Therefore,
\|Fn-F\|infty=\supx\in|Fn(x)-F(x)|\leqmaxj\in\{1,...,m\
} |F_n(x_j)-F(x_j)| + \frac1m.
Since by strong law of large numbers, we can guarantee that for any positive and any integer such that , we can find such that for all
, we have
. Combined with the above result, this further implies that
, which is the definition of almost sure convergence.
Empirical measures
One can generalize the empirical distribution function by replacing the set
by an arbitrary set
C from a class of sets
to obtain an
empirical measure indexed by sets
Where
is the
indicator function of each set
.
Further generalization is the map induced by
on measurable real-valued functions
f, which is given by
f\mapstoPnf=\intSfdPn=
f(Xi),f\inl{F}.
Then it becomes an important property of these classes whether the strong law of large numbers holds uniformly on
or
.
Glivenko–Cantelli class
Consider a set
with a sigma algebra of
Borel subsets and a
probability measure
For a class of subsets,
l{C}\subsetl\{C:Cismeasurablesubsetofl{S}r\}
and a class of functions
l{F}\subsetl\{f:l{S}\toR,fismeasurable r\}
define random variables
} \bigl| \mathbb_n(C) - \mathbb(C) \bigr|
} \bigl| \mathbb_n f - \mathbb f \bigr|
where
is the empirical measure,
is the corresponding map, and
assuming that it exists.
Definitions
is called a
Glivenko–Cantelli class (or
GC class, or sometimes
strong GC class) with respect to a probability measure if
almost surely as
is a
weak Glivenko-Cantelli class with respect to if it instead satisfies the weaker condition
in probability as
- A class is called a universal Glivenko–Cantelli class if it is a GC class with respect to any probability measure
on
.
- A class is a weak uniform Glivenko–Cantelli class if the convergence occurs uniformly over all probability measures
on
: For every
,
\supP,A)}\Pr\left(l\|Pn-Pr\|l{C}>\varepsilon\right)\to0
as
- A class is a (strong) uniform Glivenko-Cantelli class if it satisfies the stronger condition that for every
,
\supP,A)}\Pr\left(\supml\|Pm-Pr\|l{C}>\varepsilon\right)\to0
as
Glivenko–Cantelli classes of functions (as well as their uniform and universal forms) are defined similarly, replacing all instances of
with
.
The weak and strong versions of the various Glivenko-Cantelli properties often coincide under certain regularity conditions. The following definition commonly appears in such regularity conditions:
is
image-admissible Suslin if there exists a Suslin space
and a surjection
such that the map
is measurable
.
- A class of measurable sets
is image-admissible Suslin if the class of functions
is image-admissible Suslin, where
denotes the
indicator function for the set
.
Theorems
The following two theorems give sufficient conditions for the weak and strong versions of the Glivenko-Cantelli property to be equivalent.
Theorem (Talagrand, 1987)[6]
Let
be a class of functions that is integrable
, and define
l{F}0=\{f-Pf\midf\inl{F}\}
. Then the following are equivalent:
is a weak Glivenko-Cantelli class and
is dominated by an integrable function
is a Glivenko-Cantelli class
Theorem (Dudley, Giné, and Zinn, 1991)[7]
Suppose that a function class
is bounded. Also suppose that the set
l{F}0=\{f-inff\midf\inl{F}\}
is image-admissible Suslin. Then
is a weak uniform Glivenko-Cantelli class if and only if it is a strong uniform Glivenko-Cantelli class.
The following theorem is central to statistical learning of binary classification tasks.
Theorem (Vapnik and Chervonenkis, 1968)[8]
Under certain consistency conditions, a universally measurable class of sets
is a uniform Glivenko-Cantelli class if and only if it is a Vapnik–Chervonenkis class.
There exist a variety of consistency conditions for the equivalence of uniform Glivenko-Cantelli and Vapnik-Chervonenkis classes. In particular, either of the following conditions for a class
suffice:
[9]
is image-admissible Suslin.
is universally
separable: There exists a countable subset
of
such that each set
can be written as the pointwise limit of sets in
.
Examples
and
{lC}=\{(-infty,t]:t\in{R}\}
. The classical Glivenko–Cantelli theorem implies that this class is a universal GC class. Furthermore, by
Kolmogorov's theorem,
\supP\in(S,A)}\|Pn-P\|lC\simn-1/2
, that is
is uniformly Glivenko–Cantelli class.
- Let P be a nonatomic probability measure on S and
be a class of all finite subsets in
S. Because
An=\{X1,\ldots,Xn\}\inl{C}
,
,
, we have that
and so
is
not a GC class with respect to
P.
See also
Further reading
- Book: Dudley, R.M. . Richard M. Dudley . 1999 . Uniform Central Limit Theorems . Cambridge University Press . 0-521-46102-2 .
- Book: Pitman, E.J.G. . 1979 . Some Basic Theory for Statistical Inference . London, UK . Chapman and Hall . 0-470-26554-X . The Sample Distribution Function . 79–97 .
- Book: Shorack, G.R. . Galen Shorack . Jon A. Wellner . Wellner . J.A. . 1986 . Empirical Processes with Applications to Statistics . Wiley . 0-471-86725-X .
- Book: Aad van der Vaart . van der Vaart . A.W. . Wellner . J.A. . 1996 . Weak Convergence and Empirical Processes . Springer . 0-387-94640-3 .
- Book: Aad van der Vaart . Aad W. . van der Vaart . Jon A. . Wellner . 1996 . Glivenko-Cantelli Theorems . Springer .
- Book: Aad van der Vaart . Aad W. . van der Vaart . Jon A. . Wellner . 2000 . Preservation Theorems for Glivenko–Cantelli and Uniform Glivenko–Cantelli Classes . Springer .
Notes and References
- Howard G.Tucker. 1959 . A Generalization of the Glivenko–Cantelli Theorem . The Annals of Mathematical Statistics . 30. 3 . 10.1214/aoms/1177706212 . 828–830. 2237422 . free .
- Book: van der Vaart, A. W. . 1998 . Asymptotic Statistics . limited . Cambridge University Press . 978-0-521-78450-4 . 279 .
- Book: van der Vaart, A.W. . 1998 . Asymptotic Statistics . Cambridge University Press . 978-0-521-78450-4 . limited.
- Glivenko, V. . 1933 . Sulla determinazione empirica delle leggi di probabilità . it . Giorn. Ist. Ital. Attuari . 4 . 92-99.
- Cantelli, F.P. . Francesco Paolo Cantelli . 1933 . Sulla determinazione empirica delle leggi di probabilità . Giorn. Ist. Ital. Attuari . 4 . 421-424.
- Talagrand . M. . Michel Talagrand . 1987 . The Glivenko-Cantelli Problem . Annals of Probability . 15 . 837-870 . 10.1214/AOP/1176992069.
- Dudley . Richard M. . Richard M. Dudley . Giné . Eva . Zinn . Joel C. . 1991 . Uniform and universal Glivenko-Cantelli classes . Journal of Theoretical Probability . 4 . 485-510 . 10.1007/BF01210321.
- Vapnik . V.N. . Vladimir Vapnik . Chervonenkis . A.Ya. . Alexey Chervonenkis . 1971 . On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability & Its Applications. 16 . 2 . 264–280 . 10.1137/1116025 .
- Pestov . Vladimir . PAC learnability versus VC dimension: A footnote to a basic result of statistical learning . The 2011 International Joint Conference on Neural Networks . 1141-1145 . 2011 . 10.1109/IJCNN.2011.6033352. 1104.2097 .