Jensen's inequality explained

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.[3]

Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for t ∈ [0,1]),

tf(x1)+(1-t)f(x2),

while the graph of the function is the convex function of the weighted means,

f(tx1+(1-t)x2).

Thus, Jensen's inequality is

f(tx1+(1-t)x2)\leqtf(x1)+(1-t)f(x2).

In the context of probability theory, it is generally stated in the following form: if X is a random variable and is a convex function, then

\varphi(\operatorname[X]) \leq \operatorname \left[\varphi(X)\right].

The difference between the two sides of the inequality,

\operatorname{E}\left[\varphi(X)\right]-\varphi\left(\operatorname{E}[X]\right)

, is called the Jensen gap.[4]

Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.

Finite form

\varphi

, numbers

x1,x2,\ldots,xn

in its domain, and positive weights

ai

, Jensen's inequality can be stated as:

and the inequality is reversed if

\varphi

is concave, which is

Equality holds if and only if

x1=x2= … =xn

or

\varphi

is linear on a domain containing

x1,x2,,xn

.

As a particular case, if the weights

ai

are all equal, then and become

For instance, the function is concave, so substituting

\varphi(x)=log(x)

in the previous formula establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality:

\log\!\left(\frac\right) \geq \frac\exp\!\left(\log\!\left(\frac\right)\right) \geq \exp\!\left(\frac \right)\frac \geq \sqrt[n]

A common application has as a function of another variable (or set of variables), that is,

xi=g(ti)

. All of this carries directly over to the general continuous case: the weights are replaced by a non-negative integrable function, such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic form

Let

(\Omega,A,\mu)

be a probability space. Let

f:\Omega\toR

be a

\mu

-measurable function and

\varphi:R\toR

be convex. Then:[5] \varphi\left(\int_\Omega f \,\mathrm\mu\right) \leq \int_\Omega \varphi \circ f \,\mathrm\mu

In real analysis, we may require an estimate on

b
\varphi\left(\int
a

f(x)dx\right)

where

a,b\inR

, and

f\colon[a,b]\to\R

is a non-negative Lebesgue-integrable function. In this case, the Lebesgue measure of

[a,b]

need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get[6]
\varphi\left(1
b-a
b
\int
a

f(x)dx\right)\le

1
b-a
b
\int
a

\varphi(f(x))dx.

Probabilistic form

The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let

(\Omega,ak{F},\operatorname{P})

be a probability space, X an integrable real-valued random variable and a convex function. Then:

\varphi\left(\operatorname{E}[X]\right)\leq\operatorname{E}\left[\varphi(X)\right].

[7]

In this probability setting, the measure is intended as a probability

\operatorname{P}

, the integral with respect to as an expected value

\operatorname{E}

, and the function

f

as a random variable X.

Note that the equality holds if and only if is a linear function on some convex set

A

such that

P(X\inA)=1

(which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting

More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element

\operatorname{E}[X]

in T, such that for any element z in the dual space of T:

\operatorname{E}|\langlez,X\rangle|<infty

, and

\langlez,\operatorname{E}[X]\rangle=\operatorname{E}[\langlez,X\rangle]

. Then, for any measurable convex function and any sub-σ-algebra

ak{G}

of

ak{F}

:

\varphi\left(\operatorname{E}\left[X\midak{G}\right]\right)\leq\operatorname{E}\left[\varphi(X)\midak{G}\right].

Here

\operatorname{E}[\midak{G}]

stands for the expectation conditioned to the σ-algebra

ak{G}

. This general statement reduces to the previous ones when the topological vector space is the real axis, and

ak{G}

is the trivial -algebra

Notes and References

  1. Jensen . J. L. W. V. . Johan Jensen (mathematician) . 1906 . Sur les fonctions convexes et les inégalités entre les valeurs moyennes . . 30 . 1 . 175–193 . 10.1007/BF02418571 . free .
  2. Guessab . A. . Schmeisser . G. . 10.1007/s00013-013-0522-3 . 6 . Archiv der Mathematik . 3069109 . 561–570 . Necessary and sufficient conditions for the validity of Jensen's inequality . 100 . 2013. 56372266 .
  3. Book: Dekking . F.M. . Kraaikamp . C. . Lopuhaa . H.P. . Meester . L.E. . A Modern Introduction to Probability and Statistics: Understanding Why and How . Springer Texts in Statistics . 2005 . Springer . London . 10.1007/1-84628-168-7 . 978-1-85233-896-1 .
  4. Gao . Xiang . Sitharam . Meera . Roitberg . Adrian . 2019 . Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions . The Australian Journal of Mathematical Analysis and Applications . 1712.05267 . 16 . 2 .
  5. p. 25 of Book: Rick Durrett. Rick Durrett. Probability: Theory and Examples . 2019 . Cambridge University Press . 978-1108473682 . 5th .
  6. Niculescu, Constantin P. "Integral inequalities", P. 12.
  7. p. 29 of Book: Rick Durrett. Rick Durrett. Probability: Theory and Examples . 2019 . Cambridge University Press . 978-1108473682 . 5th .