Lexis ratio explained

The Lexis ratio^[1] is used in statistics as a measure which seeks to evaluate differences between the statistical properties of random mechanisms where the outcome is two-valued - for example "success" or "failure", "win" or "lose". The idea is that the probability of success might vary between different sets of trials in different situations. This ratio is not much used currently having been largely replaced by the use of the chi-squared test in testing for the homogeneity of samples.

This measure compares the between-set variance of the sample proportions (evaluated for each set) with what the variance should be if there were no difference between in the true proportions of success across the different sets. Thus the measure is used to evaluate how data compares to a fixed-probability-of-success Bernoulli distribution. The term "Lexis ratio" is sometimes referred to as L or Q, where

L²=Q²=

s²

	2
\sigma
	0

Where

s²

is the (weighted) sample variance derived from the observed proportions of success in sets in "Lexis trials" and

	2
\sigma
	0

is the variance computed from the expected Bernoulli distribution on the basis of the overall average proportion of success. Trials where L falls significantly above or below 1 are known as supernormal and subnormal, respectively.

This ratio (Q) is a measure that can be used to distinguish between three types of variation in sampling for attributes: Bernoullian, Lexian and Poissonian. The Lexis ratio is sometimes also referred to as L.

Definition

Let there be k samples of size n₁, n₃, n₃, ..., n_k and these samples have the proportion of the attribute being examined of p₁, p₂, p₃, ..., p_k respectively. Then the Lexis ratio is

	\sum{n_i(p_i-p)²
	}{

(k-1)p(1-p)}

If the Lexis ratio is significantly below 1, the sampling is referred to as Poissonian (or subnormal); it is equal to 1 the sampling is referred to as Bernoullian (or normal); and if it is above 1 it is referred to as Lexian (or supranormal). Chuprov showed in 1922 that in the case of statistical homogeneity

E(Q)=1

and

var(Q)=

	2
	n-1

where E is the expectation and var is the variance. The formula for the variance is approximate and holds only for large values of n.

An alternative definition is

s²

	2
\sigma
	0

here

s²

is the (weighted) sample variance derived from the observed proportions of success in sets in "Lexis trials" and

	2
\sigma
	0

is the variance computed from the expected Bernoulli distribution on the basis of the overall average proportion of success.

Lexis variation

A closely related concept is the Lexis variation. Let k samples each of size n be drawn at random. Let the probability of success (p) be constant and let the actual probability of success in the k^th sample be p₁, p₂, ..., p_k.

The average probability of success (p) is

	1
	k

\sum{p_i}

The variance in the number of successes is

var(successes)=np(1-p)+n(n-1)var(p_i)

where var(p_i) is the variance of the p_i.

If all the p_i are equal the sampling is said to be Bernoullian; where the p_i differ the sampling is said to be Lexian and the dispersion is said to be supranormal.

Lexian sampling occurs in sampling from non homogenous strata.

History

Wilhelm Lexis introduced this statistic to test the then commonly held assumption that sampling data could be regarded as homogeneous.

Notes and References

Lexis W (1877) Zur Theorie Der Massenerscheinungen in Der Menschlichen Gesellschaft.