The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if
X
\operatorname{E}(X)
Y
\operatorname{E}(X)=\operatorname{E}(\operatorname{E}(X\midY)),
i.e., the expected value of the conditional expected value of
X
Y
X
Note: The conditional expected value E(X | Y), with Y a random variable, is not a simple number; it is a random variable whose value depends on the value of Y. That is, the conditional expected value of X given the event Y = y is a number and it is a function of y. If we write g(y) for the value of E(X | Y = y) then the random variable E(X | Y) is g(Y).
One special case states that if
{\left\{Ai\right\}}
\operatorname{E}(X)=\sumi{\operatorname{E}(X\midAi)\operatorname{P}(Ai)}.
Suppose that only two factories supply light bulbs to the market. Factory
X
Y
X
Applying the law of total expectation, we have:
\begin{align} \operatorname{E}(L)&=\operatorname{E}(L\midX)\operatorname{P}(X)+\operatorname{E}(L\midY)\operatorname{P}(Y)\\[3pt] &=5000(0.6)+4000(0.4)\\[2pt] &=4600 \end{align}
where
\operatorname{E}(L)
\operatorname{P}(X)={6\over10}
X
\operatorname{P}(Y)={4\over10}
Y
\operatorname{E}(L\midX)=5000
X
\operatorname{E}(L\midY)=4000
Y
Thus each purchased light bulb has an expected lifetime of 4600 hours.
When a joint probability density function is well defined and the expectations are integrable, we write for the general caseA similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.
Let
(\Omega,l{F},\operatorname{P})
l{G}1\subseteql{G}2\subseteql{F}
X
\operatorname{E}[X]
min(\operatorname{E}[X+],\operatorname{E}[X-])<infty
\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]=\operatorname{E}[X\midl{G}1] (a.s.).
Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:
\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]isl{G}1
\int | |
G1 |
\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]d\operatorname{P}=
\int | |
G1 |
Xd\operatorname{P},
G1\inl{G}1.
The first of these properties holds by definition of the conditional expectation. To prove the second one,
\begin{align} min\left(\int | |
G1 |
X+d\operatorname{P},
\int | |
G1 |
X-d\operatorname{P}\right)&\leqmin\left(\int\OmegaX+d\operatorname{P},\int\OmegaX-d\operatorname{P}\right)\\[4pt] &=min(\operatorname{E}[X+],\operatorname{E}[X-])<infty, \end{align}
so the integral
style\int | |
G1 |
Xd\operatorname{P}
infty-infty
The second property thus holds since
G1\inl{G}1\subseteql{G}2
\int | |
G1 |
\operatorname{E}[\operatorname{E}[X\midl{G}2]\midl{G}1]d\operatorname{P} =
\int | |
G1 |
\operatorname{E}[X\midl{G}2]d\operatorname{P} =
\int | |
G1 |
Xd\operatorname{P}.
Corollary. In the special case when
l{G}1=\{\empty,\Omega\}
l{G}2=\sigma(Y)
\operatorname{E}[\operatorname{E}[X\midY]]=\operatorname{E}[X].
Alternative proof for
\operatorname{E}[\operatorname{E}[X\midY]]=\operatorname{E}[X].
This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition,
\operatorname{E}[X\midY]:=\operatorname{E}[X\mid\sigma(Y)]
\sigma(Y)
\intA\operatorname{E}[X\midY]d\operatorname{P}=\intAXd\operatorname{P},
A\in\sigma(Y)
A=\Omega