Cantelli's inequality explained

In probability theory, Cantelli's inequality (also called the Chebyshev-Cantelli inequality and the one-sided Chebyshev inequality) is an improved version of Chebyshev's inequality for one-sided tail bounds.[1] [2] [3] The inequality states that, for

λ>0,

\Pr(X-E[X]\geλ) \le

\sigma2
\sigma22

,

where

X

is a real-valued random variable,

\Pr

is the probability measure,

E[X]

is the expected value of

X

,

\sigma2

is the variance of

X

.

Applying the Cantelli inequality to

-X

gives a bound on the lower tail,

\Pr(X-E[X]\le) \le

\sigma2
\sigma22

.

While the inequality is often attributed to Francesco Paolo Cantelli who published it in 1928,[4] it originates in Chebyshev's work of 1874.[5] When bounding the event random variable deviates from its mean in only one direction (positive or negative), Cantelli's inequality gives an improvement over Chebyshev's inequality. The Chebyshev inequality has "higher moments versions" and "vector versions", and so does the Cantelli inequality.

Comparison to Chebyshev's inequality

For one-sided tail bounds, Cantelli's inequality is better, since Chebyshev's inequality can only get

\Pr(X-E[X]\geqλ)\leq\Pr(|X-E[X]|\geλ) \le

\sigma2
λ2

.

On the other hand, for two-sided tail bounds, Cantelli's inequality gives

\Pr(|X-E[X]|\geλ) =\Pr(X-E[X]\geλ)+\Pr(X-E[X]\le) \le

2\sigma2
\sigma22

,

which is always worse than Chebyshev's inequality (when

λ\geq\sigma

; otherwise, both inequalities bound a probability by a value greater than one, and so are trivial).

Proof

Let

X

be a real-valued random variable with finite variance

\sigma2

and expectation

\mu

, and define

Y=X-E[X]

(so that

E[Y]=0

and

\operatorname{Var}(Y)=\sigma2

).

Then, for any

u\geq0

, we have

\Pr(X-E[X]\geqλ) =\Pr(Y\geqλ)=\Pr(Y+u\geqλ+u) \leq\Pr((Y+u)2\geq(λ+u)2) \leq

E[(Y+u)2]
(λ+u)2

=

\sigma2+u2
(λ+u)2

.

the last inequality being a consequence of Markov's inequality. As the above holds for any choice of

u\inR

, we can choose to apply it with the value that minimizes the function

u\geq0\mapsto

\sigma2+u2
(λ+u)2
. By differentiating, this can be seen to be

u\ast=

\sigma2
λ
, leading to

\Pr(X-E[X]\geqλ)\leq

2
\sigma+
2
u
\ast
(λ+
2
u
\ast)

=

\sigma2
λ2+\sigma2

if

λ>0

Generalizations

Various stronger inequalities can be shown.He, Zhang, and Zhang showed[6] (Corollary 2.3) when

E[X]=0,E[X2]=1

and

λ\ge0

:

\Pr(X\geλ)\le1-(2\sqrt{3}-3)

(1+λ2)2
E[X4]+6λ2+λ4

.

In the case

λ=0

this matches a bound in Berger's "The Fourth Moment Method",[7]

\Pr(X\ge0)\ge

2\sqrt{3
-3}{E[X

4]}.

This improves over Cantelli's inequality in that we can get a non-zero lower bound, even when

E[X]=0

.

See also

Notes and References

  1. Book: Boucheron, Stéphane. Concentration inequalities : a nonasymptotic theory of independence. 2013. Oxford University Press. Gábor Lugosi, Pascal Massart. 978-0-19-953525-5. Oxford. 829910957.
  2. http://www.cse.buffalo.edu/~hungngo/classes/2011/Spring-694/lectures/l4.pdf "Tail and Concentration Inequalities" by Hung Q. Ngo
  3. http://www.econ.upf.edu/~lugosi/anu.pdf "Concentration-of-measure inequalities" by Gábor Lugosi
  4. Cantelli, F. P. (1928), "Sui confini della probabilita," Atti del Congresso Internazional del Matematici, Bologna, 6, 47-5
  5. https://www.jstor.org/stable/3087296 Ghosh, B.K., 2002. Probability inequalities related to Markov's theorem. The American Statistician, 56(3), pp.186-190
  6. He . S. . Zhang . J. . Zhang . S. . 11298475 . 2010 . Bounding probability of small deviation: A fourth moment approach . Mathematics of Operations Research . 35 . 1. 208–232 . 10.1287/moor.1090.0438 .
  7. Berger . Bonnie . August 1997 . The Fourth Moment Method . SIAM Journal on Computing . en . 26 . 4 . 1188–1207 . 10.1137/S0097539792240005 . 14313557 . 0097-5397.