Law of total expectation explained

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if

X

is a random variable whose expected value

\operatorname{E}(X)

is defined, and

Y

is any random variable on the same probability space, then

\operatorname{E}(X)=\operatorname{E}(\operatorname{E}(X\midY)),

i.e., the expected value of the conditional expected value of

X

given

Y

is the same as the expected value of

X

.

Note: The conditional expected value E(X | Y), with Y a random variable, is not a simple number; it is a random variable whose value depends on the value of Y. That is, the conditional expected value of X given the event Y = y is a number and it is a function of y. If we write g(y) for the value of E(X | Y = y) then the random variable E(X | Y) is g(Y).

One special case states that if

{\left\{Ai\right\}}

is a finite or countable partition of the sample space, then

\operatorname{E}(X)=\sumi{\operatorname{E}(X\midAi)\operatorname{P}(Ai)}.

Example

Suppose that only two factories supply light bulbs to the market. Factory

X

's bulbs work for an average of 5000 hours, whereas factory

Y

's bulbs work for an average of 4000 hours. It is known that factory

X

supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

\begin{align} \operatorname{E}(L)&=\operatorname{E}(L\midX)\operatorname{P}(X)+\operatorname{E}(L\midY)\operatorname{P}(Y)\\[3pt] &=5000(0.6)+4000(0.4)\\[2pt] &=4600 \end{align}

where

\operatorname{E}(L)

is the expected life of the bulb;

\operatorname{P}(X)={6\over10}

is the probability that the purchased bulb was manufactured by factory

X

;

\operatorname{P}(Y)={4\over10}

is the probability that the purchased bulb was manufactured by factory

Y

;

\operatorname{E}(L\midX)=5000

is the expected lifetime of a bulb manufactured by

X

;

\operatorname{E}(L\midY)=4000

is the expected lifetime of a bulb manufactured by

Y

.

Thus each purchased light bulb has an expected lifetime of 4600 hours.

Informal proof

When a joint probability density function is well defined and the expectations are integrable, we write for the general case\begin \operatorname E(X) &= \int x \Pr[X=x] ~dx \\\operatorname E(X\mid Y=y) &= \int x \Pr[X=x\mid Y=y] ~dx \\\operatorname E(\operatorname E(X\mid Y)) &= \int \left(\int x \Pr[X=x\mid Y=y] ~dx \right) \Pr[Y=y] ~dy \\&= \int \int x \Pr[X = x, Y= y] ~dx ~dy \\&= \int x \left(\int \Pr[X = x, Y = y] ~dy \right) ~dx \\&= \int x \Pr[X = x] ~dx \\&= \operatorname E(X)\,.\endA similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.

Proof in the general case

Let

(\Omega,l{F},\operatorname{P})

be a probability space on which two sub σ-algebras

l{G}1\subseteql{G}2\subseteql{F}

are defined. For a random variable

X

on such a space, the smoothing law states that if

\operatorname{E}[X]

is defined, i.e.

min(\operatorname{E}[X+],\operatorname{E}[X-])<infty

, then

\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]=\operatorname{E}[X\midl{G}1](a.s.).

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]isl{G}1

-measurable
\int
G1

\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]d\operatorname{P}=

\int
G1

Xd\operatorname{P},

for all

G1\inl{G}1.

The first of these properties holds by definition of the conditional expectation. To prove the second one,

\begin{align} min\left(\int
G1

X+d\operatorname{P},

\int
G1

X-d\operatorname{P}\right)&\leqmin\left(\int\OmegaX+d\operatorname{P},\int\OmegaX-d\operatorname{P}\right)\\[4pt] &=min(\operatorname{E}[X+],\operatorname{E}[X-])<infty, \end{align}

so the integral

style\int
G1

Xd\operatorname{P}

is defined (not equal

infty-infty

).

The second property thus holds since

G1\inl{G}1\subseteql{G}2

implies
\int
G1

\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]d\operatorname{P} =

\int
G1

\operatorname{E}[X\midl{G}2]d\operatorname{P} =

\int
G1

Xd\operatorname{P}.

Corollary. In the special case when

l{G}1=\{\empty,\Omega\}

and

l{G}2=\sigma(Y)

, the smoothing law reduces to

\operatorname{E}[\operatorname{E}[X\midY]]=\operatorname{E}[X].

Alternative proof for

\operatorname{E}[\operatorname{E}[X\midY]]=\operatorname{E}[X].

This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition,

\operatorname{E}[X\midY]:=\operatorname{E}[X\mid\sigma(Y)]

is a

\sigma(Y)

-measurable random variable that satisfies

\intA\operatorname{E}[X\midY]d\operatorname{P}=\intAXd\operatorname{P},

for every measurable set

A\in\sigma(Y)

. Taking

A=\Omega

proves the claim.

See also

References

Notes and References

  1. Book: Weiss, Neil A. . [{{Google books |plainurl=yes |id=p-rwJAAACAAJ |page=380 }} A Course in Probability ]. Boston . Addison–Wesley . 2005 . 0-321-18954-X . 380–383 .
  2. Web site: Law of Iterated Expectation Brilliant Math & Science Wiki. brilliant.org. en-us. 2018-03-28.
  3. Web site: Adam's and Eve's Laws . 2022-04-19.
  4. Web site: Probability and Statistics. Rhee. Chang-han. Sep 20, 2011.
  5. Web site: Conditional Expectation. Wolpert. Robert. November 18, 2010.