In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result that characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.
The Rao–Blackwell theorem states that if g(X) is any kind of estimator of a parameter θ, then the conditional expectation of g(X) given T(X), where T is a sufficient statistic, is typically a better estimator of θ, and is never worse. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.
The theorem is named after C.R. Rao and David Blackwell. The process of transforming an estimator using the Rao–Blackwell theorem can be referred to as Rao–Blackwellization. The transformed estimator is called the Rao–Blackwell estimator.
In other words, a sufficient statistic T(X) for a parameter θ is a statistic such that the conditional probability of the data X, given T(X), does not depend on the parameter θ.
One case of Rao–Blackwell theorem states:
The mean squared error of the Rao–Blackwell estimator does not exceed that of the original estimator.
In other words,
2)\leq | |
\operatorname{E}((\delta | |
1(X)-\theta) |
\operatorname{E}((\delta(X)-\theta)2).
The essential tools of the proof besides the definition above are the law of total expectation and the fact that for any random variable Y, E(Y2) cannot be less than [E(''Y'')]2. That inequality is a case of Jensen's inequality, although it may also be shown to follow instantly from the frequently mentioned fact that
0\leq\operatorname{Var}(Y)=\operatorname{E}((Y-\operatorname{E}(Y))2)=\operatorname{E}(Y2)-(\operatorname{E}(Y))2.
More precisely, the mean square error of the Rao-Blackwell estimator has the following decomposition[1]
2]=\operatorname{E}[(\delta(X)-\theta) | |
\operatorname{E}[(\delta | |
1(X)-\theta) |
2]-\operatorname{E}[\operatorname{Var}(\delta(X)\midT(X))]
Since
\operatorname{E}[\operatorname{Var}(\delta(X)\midT(X))]\ge0
The more general version of the Rao–Blackwell theorem speaks of the "expected loss" or risk function:
\operatorname{E}(L(\delta1(X)))\leq\operatorname{E}(L(\delta(X)))
where the "loss function" L may be any convex function. If the loss function is twice-differentiable, as in the case for mean-squared-error, then we have the sharper inequality[1]
\operatorname{E}(L(\delta(X)))-\operatorname{E}(L(\delta1(X)))\ge
1 | |
2 |
\operatorname{E}T\left[infxL''(x)\operatorname{Var}(\delta(X)\midT)\right].
The improved estimator is unbiased if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation. The theorem holds regardless of whether biased or unbiased estimators are used.
The theorem seems very weak: it says only that the Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous.
Phone calls arrive at a switchboard according to a Poisson process at an average rate of λ per minute. This rate is not observable, but the numbers X1, ..., Xn of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e-λ that the next one-minute period passes with no phone calls.
An extremely crude estimator of the desired probability is
\delta0=\left\{\begin{matrix}1&if X1=0,\ 0&otherwise,\end{matrix}\right.
i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator.
The sum
Sn=
n | |
\sum | |
i=1 |
Xi=X1+ … +Xn
can be readily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data X1, ..., Xn, depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator
\delta1=\operatorname{E}(\delta0\midSn=sn).
After doing some algebra we have
\begin{align} \delta1&=\operatorname{E}\left
(1 | |
\{X1=0\ |
Since the average number of calls arriving during the first n minutes is nλ, one might not be surprised if this estimator has a fairly high probability (if n is big) of being close to
\left(1-{1\overn}\right)nλ ≈ e-λ.
So δ1 is clearly a very much improved estimator of that last quantity. In fact, since Sn is complete and δ0 is unbiased, δ1 is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem.
Rao–Blackwellization is an idempotent operation. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator.
If the conditioning statistic is both complete and sufficient, and the starting estimator is unbiased, then the Rao - Blackwell estimator is the unique "best unbiased estimator": see Lehmann–Scheffé theorem.
An example of an improvable Rao–Blackwell improvement, when using a minimal sufficient statistic that is not complete, was provided by Galili and Meilijson in 2016.[2] Let
X1,\ldots,Xn
X\simU\left((1-k)\theta,(1+k)\theta\right),
E[X]=\theta
k\in(0,1)
\theta,
X1
\theta
X1
T=\left(X(1),X(n)\right)
\theta
X(1)=min(Xi)
X(n)=max(Xi)
\hat{\theta}RB=E\theta\left[X1|X(1),X(n)\right]=
X(1)+X(n) | |
2 |
.
However, the following unbiased estimator can be shown to have lower variance:
\hat{\theta}LV=
1 | |||||
|
\left[(1-k){{X}(1)
And in fact, it could be even further improved when using the following estimator:
\hat{\theta}BAYES=
n+1 | |
n |
\left[1-
| |||||||||
1-k |
\right)}{\left(
{{X | |
(n) |
The model is a scale model. Optimal equivariant estimators can then be derived for loss functions that are invariant.[3]