Big O in probability notation explained

The order in probability notation is used in probability theory and statistical theory in direct parallel to the big-O notation that is standard in mathematics. Where the big-O notation deals with the convergence of sequences or sets of ordinary numbers, the order in probability notation deals with convergence of sets of random variables, where convergence is in the sense of convergence in probability.^[1]

Definitions

Small o: convergence in probability

For a set of random variables X_n and corresponding set of constants a_n (both indexed by n, which need not be discrete), the notation

X_n=o_p(a_n)

means that the set of values X_n/a_n converges to zero in probability as n approaches an appropriate limit.Equivalently, X_n = o_p(a_n) can be written as X_n/a_n = o_p(1),i.e.

\lim_nP\left[\left|

	X_n
	a_n

\right|\geq\varepsilon\right]=0,

for every positive ε.^[2]

Big O: stochastic boundedness

The notation

X_n=O_p(a_n)asn\toinfty

means that the set of values X_n/a_n is stochastically bounded. That is, for any ε > 0, there exists a finite M > 0 and a finite N > 0 such that

P\left(\|	X_n
	a_n

|>M\right)<\varepsilon, \forall n>N.

Comparison of the two definitions

The difference between the definitions is subtle. If one uses the definition of the limit, one gets:

O_p(1)

\forall\varepsilon \existsN_\varepsilon,\delta_\varepsilon suchthatP(|X_n|\geq\delta_\varepsilon)\leq\varepsilon \foralln>N_\varepsilon

Small

o_p(1)

\forall\varepsilon,\delta \existsN_{\varepsilon,\delta} suchthatP(|X_n|\geq\delta)\leq\varepsilon \foralln>N_\varepsilon,

The difference lies in the

\delta

: for stochastic boundedness, it suffices that there exists one (arbitrary large)

\delta

to satisfy the inequality, and

\delta

is allowed to be dependent on

\varepsilon

(hence the

\delta_\varepsilon

). On the other hand, for convergence, the statement has to hold not only for one, but for any (arbitrary small)

\delta

. In a sense, this means that the sequence must be bounded, with a bound that gets smaller as the sample size increases.

This suggests that if a sequence is

o_p(1)

, then it is

O_p(1)

, i.e. convergence in probability implies stochastic boundedness. But the reverse does not hold.

Example

(X_n)

is a stochastic sequence such that each element has finite variance, then

X_n-E(X_n)=O_{p\left(\sqrt{\operatorname{var}(X}_n)}\right)

(see Theorem 14.4-1 in Bishop et al.)

If, moreover,

	-2
a
	n

\operatorname{var}(X_n)=

	-1
\operatorname{var}(a
	n

X_n)

is a null sequence for a sequence

(a_n)

of real numbers, then

	-1
a
	n

(X_n-E(X_n))

converges to zero in probability by Chebyshev's inequality, so

X_n-E(X_n)=o_p(a_n).

References

Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP.
[Yvonne Bishop|Yvonne M. Bishop]