In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.
The relative efficiency of two procedures is the ratio of their efficiencies, although often this concept is used where the comparison is made between a given procedure and a notional "best possible" procedure. The efficiencies and the relative efficiency of two procedures theoretically depend on the sample size available for the given procedure, but it is often possible to use the asymptotic relative efficiency (defined as the limit of the relative efficiencies as the sample size grows) as the principal comparison measure.
The efficiency of an unbiased estimator, T, of a parameter θ is defined as [1]
e(T) = | 1/l{I |
(\theta)}{\operatorname{var}(T)} |
where
l{I}(\theta)
An efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the mean squared error criterion of optimality.
In general, the spread of an estimator around the parameter θ is a measure of estimator efficiency and performance. This performance can be calculated by finding the mean squared error. More formally, let T be an estimator for the parameter θ. The mean squared error of T is the value
\operatorname{MSE}(T)=E[(T-\theta)2]
\begin{align} \operatorname{MSE}(T)&=\operatornameE[(T-\theta)2]=\operatornameE[(T-\operatornameE[T]+\operatornameE[T]-\theta)2]\\[5pt] &=\operatornameE[(T-\operatornameE[T])2]+2E[T-E[T]](\operatornameE[T]-\theta)+(\operatornameE[T]-\theta)2\\[5pt] &=\operatorname{var}(T)+(\operatornameE[T]-\theta)2 \end{align}
An estimator T1 performs better than an estimator T2 if
\operatorname{MSE}(T1)<\operatorname{MSE}(T2)
\operatorname{var}(T1)>\operatorname{var}(T2)
\operatornameE[T]=\theta
\operatorname{MSE}(T)=\operatorname{var}(T)
(\operatornameE[T]-\theta)2
If an unbiased estimator of a parameter θ attains
e(T)=1
Equivalently, the estimator achieves equality in the Cramér–Rao inequality for all θ. The Cramér–Rao lower bound is a lower bound of the variance of an unbiased estimator, representing the "best" an unbiased estimator can be.
An efficient estimator is also the minimum variance unbiased estimator (MVUE). This is because an efficient estimator maintains equality on the Cramér–Rao inequality for all parameter values, which means it attains the minimum variance for all parameters (the definition of the MVUE). The MVUE estimator, even if it exists, is not necessarily efficient, because "minimum" does not mean equality holds on the Cramér–Rao inequality.
Thus an efficient estimator need not exist, but if it does, it is the MVUE.
Suppose is a parametric model and are the data sampled from this model. Let be an estimator for the parameter θ. If this estimator is unbiased (that is,), then the Cramér–Rao inequality states the variance of this estimator is bounded from below:
-1 | |
\operatorname{var}[T] \geq l{I} | |
\theta |
,
\scriptstylel{I}\theta
Historically, finite-sample efficiency was an early optimality criterion. However this criterion has some limitations:
As an example, among the models encountered in practice, efficient estimators exist for: the mean μ of the normal distribution (but not the variance σ2), parameter λ of the Poisson distribution, the probability p in the binomial or multinomial distribution.
Consider the model of a normal distribution with unknown mean but known variance: The data consists of n independent and identically distributed observations from this model: . We estimate the parameter θ using the sample mean of all observations:
T(X)=
1n | |
\sum |
n | |
i=1 |
xi .
Asymptotic efficiency requires Consistency (statistics), asymptotic normally distribution of estimator, and asymptotic variance-covariance matrix no worse than any other estimator.[6]
Consider a sample of size
N
\mu
Xn\siml{N}(\mu,1).
The sample mean,
\overline{X}
X1,X2,\ldots,XN
\overline{X}=
1 | |
N |
N | |
\sum | |
n=1 |
Xn\siml{N}\left(\mu,
1 | |
N |
\right).
The variance of the mean, 1/N (the square of the standard error) is equal to the reciprocal of the Fisher information from the sample and thus, by the Cramér–Rao inequality, the sample mean is efficient in the sense that its efficiency is unity (100%).
Now consider the sample median,
\widetilde{X}
\mu
N
\mu
{\pi}/{2N},
\widetilde{X}\siml{N}\left(\mu,
\pi | |
2N |
\right).
The efficiency of the median for large
N
e\left(\widetilde{X}\right)=\left(
1 | \left( | |
N\right) |
\pi | |
2N |
\right)-1=2/\pi ≈ 0.64.
In other words, the relative variance of the median will be
\pi/2 ≈ 1.57
Note that this is the asymptotic efficiency - that is, the efficiency in the limit as sample size
N
N,
The sample mean is thus more efficient than the sample median in this example. However, there may be measures by which the median performs better. For example, the median is far more robust to outliers, so that if the Gaussian model is questionable or approximate, there may advantages to using the median (see Robust statistics).
If
T1
T2
\theta
T1
T2
\theta
T2
Formally,
T1
T2
\operatorname{E} [(T1-\theta)2] \leq \operatorname{E} [
2 | |
(T | |
2-\theta) |
]
holds for all
\theta
The relative efficiency of two unbiased estimators is defined as[9]
e(T1,T2)=
\operatorname{E | |
[ |
2 | |
(T | |
2-\theta) |
]} {\operatorname{E}[
2 | |
(T | |
1-\theta) |
]} =
\operatorname{var | |
(T |
2)}{\operatorname{var}(T1)}
Although
e
\theta
e
T1
\theta
An alternative to relative efficiency for comparing estimators, is the Pitman closeness criterion. This replaces the comparison of mean-squared-errors with comparing how often one estimator produces estimates closer to the true value than another estimator.
In estimating the mean of uncorrelated, identically distributed variables we can take advantage of the fact that the variance of the sum is the sum of the variances. In this case efficiency can be defined as the square of the coefficient of variation, i.e.,[10]
e\equiv\left(
\sigma | |
\mu |
\right)2
Relative efficiency of two such estimators can thus be interpreted as the relative sample size of one required to achieve the certainty of the other. Proof:
e1 | |
e2 |
=
| |||||||
|
.
Now because
2 | |
s | |
1 |
=n1\sigma2,
2 | |
s | |
2 |
=n2\sigma2
e1 | |
e2 |
=
n1 | |
n2 |
Efficiency of an estimator may change significantly if the distribution changes, often dropping. This is one of the motivations of robust statistics – an estimator such as the sample mean is an efficient estimator of the population mean of a normal distribution, for example, but can be an inefficient estimator of a mixture distribution of two normal distributions with the same mean and different variances. For example, if a distribution is a combination of 98% N(μ, σ) and 2% N(μ, 10σ), the presence of extreme values from the latter distribution (often "contaminating outliers") significantly reduces the efficiency of the sample mean as an estimator of μ. By contrast, the trimmed mean is less efficient for a normal distribution, but is more robust (i.e., less affected) by changes in the distribution, and thus may be more efficient for a mixture distribution. Similarly, the shape of a distribution, such as skewness or heavy tails, can significantly reduce the efficiency of estimators that assume a symmetric distribution or thin tails.
While efficiency is a desirable quality of an estimator, it must be weighed against other considerations, and an estimator that is efficient for certain distributions may well be inefficient for other distributions. Most significantly, estimators that are efficient for clean data from a simple distribution, such as the normal distribution (which is symmetric, unimodal, and has thin tails) may not be robust to contamination by outliers, and may be inefficient for more complicated distributions. In robust statistics, more importance is placed on robustness and applicability to a wide variety of distributions, rather than efficiency on a single distribution. M-estimators are a general class of estimators motivated by these concerns. They can be designed to yield both robustness and high relative efficiency, though possibly lower efficiency than traditional estimators for some cases. They can be very computationally complicated, however.
A more traditional alternative are L-estimators, which are very simple statistics that are easy to compute and interpret, in many cases robust, and often sufficiently efficient for initial estimates. See applications of L-estimators for further discussion.
Efficiency in statistics is important because they allow one to compare the performance of various estimators. Although an unbiased estimator is usually favored over a biased one, a more efficient biased estimator can sometimes be more valuable than a less efficient unbiased estimator. For example, this can occur when the values of the biased estimator gathers around a number closer to the true value. Thus, estimator performance can be predicted easily by comparing their mean squared errors or variances.
For comparing significance tests, a meaningful measure of efficiency can be defined based on the sample size required for the test to achieve a given task power.
Pitman efficiency and Bahadur efficiency (or Hodges–Lehmann efficiency)[11] [12] [13] relate to the comparison of the performance of statistical hypothesis testing procedures.
For experimental designs, efficiency relates to the ability of a design to achieve the objective of the study with minimal expenditure of resources such as time and money. In simple cases, the relative efficiency of designs can be expressed as the ratio of the sample sizes required to achieve a given objective.[14]