Cramér–von Mises criterion explained
compared to a given
empirical distribution function
, or for comparing two empirical distributions. It is also used as a part of other algorithms, such as
minimum distance estimation. It is defined as
\omega2=
[Fn(x)-F*(x)]2dF*(x)
In one-sample applications
is the theoretical distribution and
is the
empirically observed distribution. Alternatively the two distributions can both be empirically estimated ones; this is called the two-sample case.
The criterion is named after Harald Cramér and Richard Edler von Mises who first proposed it in 1928–1930.[1] [2] The generalization to two samples is due to Anderson.[3]
The Cramér–von Mises test is an alternative to the Kolmogorov–Smirnov test (1933).[4]
Cramér–von Mises test (one sample)
Let
be the observed values, in increasing order. Then the statistic is
[3] [5] T=n\omega2=
+
\left[
-F(xi)\right]2.
If this value is larger than the tabulated value, then the hypothesis that the data came from the distribution
can be rejected.
Watson test
A modified version of the Cramér–von Mises test is the Watson test[6] which uses the statistic U2, where[5]
U2=T-n(\bar{F}-\tfrac{1}{2})2,
where
Cramér–von Mises test (two samples)
Let
and
be the observed values in the first and second sample respectively, in increasing order. Let
be the ranks of the
xs in the combined sample, and let
be the ranks of the
ys in the combined sample. Anderson
[3] shows that
where U is defined as
If the value of T is larger than the tabulated values,[3] the hypothesis that the two samples come from the same distribution can be rejected. (Some books give critical values for U, which is more convenient, as it avoids the need to compute T via the expression above. The conclusion will be the same.)
The above assumes there are no duplicates in the
,
, and
sequences. So
is unique, and its rank is
in the sorted list
. If there are duplicates, and
through
are a run of identical values in the sorted list, then one common approach is the
midrank[7] method: assign each duplicate a "rank" of
. In the above equations, in the expressions
and
, duplicates can modify all four variables
,
,
, and
.
References
- Book: D'Agostino, R.B. . Stephens, M.A. . 1986. Goodness-of-Fit Techniques. Tests Based on EDF Statistics. M. A. Stephens. Marcel Dekker. New York. 0-8247-7487-6.
Further reading
Notes and References
- H. . Cramér . On the Composition of Elementary Errors . Scandinavian Actuarial Journal . 1928 . 1 . 13–74 . 1928 . 10.1080/03461238.1928.10416862 .
- Book: von Mises, R. E. . Wahrscheinlichkeit, Statistik und Wahrheit . Julius Springer . 1928 .
- Anderson. T. W.. Theodore Wilbur Anderson. 1962. On the Distribution of the Two-Sample Cramer–von Mises Criterion. Annals of Mathematical Statistics. 33. 3. 1148–1159. Institute of Mathematical Statistics. 0003-4851. 10.1214/aoms/1177704477. PDF. June 12, 2009. free.
- A.N. Kolmogorov, "Sulla determinizione empirica di una legge di distribuzione" Giorn. Ist. Ital. Attuari, 4 (1933) pp. 83–91
- [Egon Pearson|Pearson, E.S.]
- Watson, G.S. (1961) "Goodness-Of-Fit Tests on a Circle", Biometrika, 48 (1/2), 109-114
- Ruymgaart, F. H., (1980) "A unified approach to the asymptotic distribution theory of certain midrank statistics". In: Statistique non Parametrique Asymptotique, 1±18, J. P. Raoult (Ed.), Lecture Notes on Mathematics, No. 821, Springer, Berlin.