In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906,[1] building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889.[2] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.[3]
Jensen's inequality generalizes the statement that the secant line of a convex function lies above the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for t ∈ [0,1]),
tf(x1)+(1-t)f(x2),
while the graph of the function is the convex function of the weighted means,
f(tx1+(1-t)x2).
Thus, Jensen's inequality is
f(tx1+(1-t)x2)\leqtf(x1)+(1-t)f(x2).
In the context of probability theory, it is generally stated in the following form: if X is a random variable and is a convex function, then
The difference between the two sides of the inequality,
\operatorname{E}\left[\varphi(X)\right]-\varphi\left(\operatorname{E}[X]\right)
The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of measure theory or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its full strength.
\varphi
x1,x2,\ldots,xn
ai
and the inequality is reversed if
\varphi
Equality holds if and only if
x1=x2= … =xn
\varphi
x1,x2, … ,xn
As a particular case, if the weights
ai
For instance, the function is concave, so substituting
\varphi(x)=log(x)
A common application has as a function of another variable (or set of variables), that is,
xi=g(ti)
Let
(\Omega,A,\mu)
f:\Omega\toR
\mu
\varphi:R\toR
In real analysis, we may require an estimate on
b | |
\varphi\left(\int | |
a |
f(x)dx\right)
where
a,b\inR
f\colon[a,b]\to\R
[a,b]
\varphi\left( | 1 |
b-a |
b | |
\int | |
a |
f(x)dx\right)\le
1 | |
b-a |
b | |
\int | |
a |
\varphi(f(x))dx.
The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let
(\Omega,ak{F},\operatorname{P})
\varphi\left(\operatorname{E}[X]\right)\leq\operatorname{E}\left[\varphi(X)\right].
In this probability setting, the measure is intended as a probability
\operatorname{P}
\operatorname{E}
f
Note that the equality holds if and only if is a linear function on some convex set
A
P(X\inA)=1
More generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element
\operatorname{E}[X]
\operatorname{E}|\langlez,X\rangle|<infty
\langlez,\operatorname{E}[X]\rangle=\operatorname{E}[\langlez,X\rangle]
ak{G}
ak{F}
\varphi\left(\operatorname{E}\left[X\midak{G}\right]\right)\leq\operatorname{E}\left[\varphi(X)\midak{G}\right].
Here
\operatorname{E}[ ⋅ \midak{G}]
ak{G}
ak{G}