In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished.[1] For example, the sample mean is a commonly used estimator of the population mean.
There are point and interval estimators. The point estimators yield single-valued results. This is in contrast to an interval estimator, where the result would be a range of plausible values. "Single value" does not necessarily mean "single number", but includes vector valued or function valued estimators.
Estimation theory is concerned with the properties of estimators; that is, with defining properties that can be used to compare different estimators (different rules for creating estimates) for the same quantity, based on the same data. Such properties can be used to determine the best rules to use under given circumstances. However, in robust statistics, statistical theory goes on to consider the balance between having good properties, if tightly defined assumptions hold, and having worse properties that hold under wider conditions.
An "estimator" or "point estimate" is a statistic (that is, a function of the data) that is used to infer the value of an unknown parameter in a statistical model. A common way of phrasing it is "the estimator is the method selected to obtain an estimate of an unknown parameter". The parameter being estimated is sometimes called the estimand. It can be either finite-dimensional (in parametric and semi-parametric models), or infinite-dimensional (semi-parametric and non-parametric models).[2] If the parameter is denoted
\theta
\widehat{\theta}
The definition places virtually no restrictions on which functions of the data can be called the "estimators". The attractiveness of different estimators can be judged by looking at their properties, such as unbiasedness, mean square error, consistency, asymptotic distribution, etc. The construction and comparison of estimators are the subjects of the estimation theory. In the context of decision theory, an estimator is a type of decision rule, and its performance may be evaluated through the use of loss functions.
When the word "estimator" is used without a qualifier, it usually refers to point estimation. The estimate in this case is a single point in the parameter space. There also exists another type of estimator: interval estimators, where the estimates are subsets of the parameter space.
The problem of density estimation arises in two applications. Firstly, in estimating the probability density functions of random variables and secondly in estimating the spectral density function of a time series. In these problems the estimates are functions that can be thought of as point estimates in an infinite dimensional space, and there are corresponding interval estimation problems.
Suppose a fixed parameter
\theta
\theta
\widehat{\theta}
\widehat{\theta}(X)
x
X=x
\widehat{\theta}(x)
\widehat{\theta}
The following definitions and attributes are relevant.[3]
For a given sample
x
\widehat{\theta}
e(x)=\widehat{\theta}(x)-\theta,
\theta
The mean squared error of
\widehat{\theta}
\operatorname{MSE}(\widehat{\theta})=\operatorname{E}[(\widehat{\theta}(X)-\theta)2].
For a given sample
x
\widehat{\theta}
d(x)=\widehat{\theta}(x)-\operatorname{E}(\widehat{\theta}(X))=\widehat{\theta}(x)-\operatorname{E}(\widehat{\theta}),
\operatorname{E}(\widehat{\theta}(X))
The variance of
\widehat{\theta}
\operatorname{Var}(\widehat{\theta})=\operatorname{E}[(\widehat{\theta}-\operatorname{E}[\widehat{\theta}])2]
The bias of
\widehat{\theta}
B(\widehat{\theta})=\operatorname{E}(\widehat{\theta})-\theta
\widehat{\theta}
\theta
\widehat{\theta}
b
\theta
\widehat{\theta}
b
There are two kinds of estimators: biased estimators and unbiased estimators. Whether an estimator is biased or not can be identified by the relationship between
\operatorname{E}(\widehat{\theta})-\theta
\operatorname{E}(\widehat{\theta})-\theta ≠ 0
\widehat{\theta}
\operatorname{E}(\widehat{\theta})-\theta=0
\widehat{\theta}
The bias is also the expected value of the error, since
\operatorname{E}(\widehat{\theta})-\theta=\operatorname{E}(\widehat{\theta}-\theta)
The estimator
\widehat{\theta}
\theta
B(\widehat{\theta})=0
An alternative to the version of "unbiased" above, is "median-unbiased", where the median of the distribution of estimates agrees with the true value; thus, in the long run half the estimates will be too low and half too high. While this applies immediately only to scalar-valued estimators, it can be extended to any measure of central tendency of a distribution: see median-unbiased estimators.
In a practical problem,
\widehat{\theta}
\theta
p1=1/4 ⋅ (\theta+2)
0<\theta<1
n
N1
Bin(n,p1)
\theta
\widehat{\theta}=4/n ⋅ N1-2
\widehat{\theta}
\theta
E[\widehat{\theta}]=E[4/n ⋅ N1-2]
=4/n ⋅ E[N1]-2
=4/n ⋅ np1-2
=4 ⋅ p1-2
=4 ⋅ 1/4 ⋅ (\theta+2)-2
=\theta+2-2
=\theta
A desired property for estimators is the unbiased trait where an estimator is shown to have no systematic tendency to produce estimates larger or smaller than the provided probability. Additionally, unbiased estimators with smaller variances are preferred over larger variances because it will be closer to the "true" value of the parameter. The unbiased estimator with the smallest variance is known as the minimum-variance unbiased estimator (MVUE).
To find if your estimator is unbiased it is easy to follow along the equation
\operatornameE(\widehat{\theta})-\theta=0
\widehat{\theta}
\theta
\operatornameE[T]=\theta
\theta2
\theta
\theta1
ExpectationWhen looking at quantities in the interest of expectation for the model distribution there is an unbiased estimator which should satisfy the two equations below.
1. \overlineXn=
X1+X2+ … +Xn | |
n |
2. \operatornameE\left[\overlineXn\right]=\mu
1.
2 | |
S | |
n |
=
1 | |
n-1 |
n | |
\sum | |
i=1 |
(Xi-
2 | |
\bar{X | |
n}) |
2. \operatorname
2 | |
E\left[S | |
n\right] |
=\sigma2
\sigma2
2 | |
S | |
n |
\sigma2
\operatorname{MSE}(\widehat{\theta})=\operatorname{Var}(\widehat\theta)+(B(\widehat{\theta}))2,
\widehat{\theta}
\theta
\widehat{\theta}
\theta
\widehat{\theta}
See main article: Consistent estimator. A consistent sequence of estimators is a sequence of estimators that converge in probability to the quantity being estimated as the index (usually the sample size) grows without bound. In other words, increasing the sample size increases the probability of the estimator being close to the population parameter.
Mathematically, a sequence of estimators is a consistent estimator for parameter θ if and only if, for all, no matter how small, we have
\limn\toinfty\Pr\left\{ \left| tn-\theta\right|<\varepsilon \right\}=1
The consistency defined above may be called weak consistency. The sequence is strongly consistent, if it converges almost surely to the true value.
An estimator that converges to a multiple of a parameter can be made into a consistent estimator by multiplying the estimator by a scale factor, namely the true value divided by the asymptotic value of the estimator. This occurs frequently in estimation of scale parameters by measures of statistical dispersion.
An estimator can be considered Fisher Consistent as long as the estimator is the same functional of the empirical distribution function as the true distribution function. Following the formula:
\widehat{\theta}=h(Tn),\theta=h(T\theta)
Tn
T\theta
\widehat{\mu}=\bar{X}
\widehat{\sigma}2=SSD/n
See main article: Asymptotic normality. An asymptotically normal estimator is a consistent estimator whose distribution around the true parameter θ approaches a normal distribution with standard deviation shrinking in proportion to
1/\sqrt{n}
\xrightarrow{D}
\sqrt{n}(tn-\theta)\xrightarrow{D}N(0,V),
In this formulation V/n can be called the asymptotic variance of the estimator. However, some authors also call V the asymptotic variance.Note that convergence will not necessarily have occurred for any finite "n", therefore this value is only an approximation to the true variance of the estimator, while in the limit the asymptotic variance (V/n) is simply zero. To be more specific, the distribution of the estimator tn converges weakly to a dirac delta function centered at
\theta
\barX
See main article: Efficiency (statistics).
The efficiency of an estimator is used to estimate the quantity of interest in a "minimum error" manner. In reality, there is not an explicit best estimator; there can only be a better estimator. The good or not of the efficiency of an estimator is based on the choice of a particular loss function, and it is reflected by two naturally desirable properties of estimators: to be unbiased
\operatorname{E}(\widehat{\theta})-\theta=0
\operatorname{E}[(\widehat{\theta}-\theta)2]
\operatorname{E}[(\widehat{\theta}-\theta)2]=(\operatorname{E}(\widehat{\theta})-\theta)2+\operatorname{Var}(\theta)
The first term represents the mean squared error; the second term represents the square of the estimator bias; and the third term represents the variance of the sample. The quality of the estimator can be identified from the comparison between the variance, the square of the estimator bias, or the MSE. The variance of the good estimator (good efficiency) would be smaller than the variance of the bad estimator (bad efficiency). The square of an estimator bias with a good estimator would be smaller than the estimator bias with a bad estimator. The MSE of a good estimator would be smaller than the MSE of the bad estimator. Suppose there are two estimator,
\theta1
\theta2
\operatorname{Var}(\theta1)<\operatorname{Var}(\theta2)
|\operatorname{E}(\theta1)-\theta|<\left|\operatorname{E}(\theta2)-\theta\right|
\operatorname{MSE}(\theta1)<\operatorname{MSE}(\theta2)
Besides using formula to identify the efficiency of the estimator, it can also be identified through the graph. If an estimator is efficient, in the frequency vs. value graph, there will be a curve with high frequency at the center and low frequency on the two sides. For example:If an estimator is not efficient, the frequency vs. value graph, there will be a relatively more gentle curve.To put it simply, the good estimator has a narrow curve, while the bad estimator has a large curve. Plotting these two curves on one graph with a shared y-axis, the difference becomes more obvious.
Among unbiased estimators, there often exists one with the lowest variance, called the minimum variance unbiased estimator (MVUE). In some cases an unbiased efficient estimator exists, which, in addition to having the lowest variance among unbiased estimators, satisfies the Cramér–Rao bound, which is an absolute lower bound on variance for statistics of a variable.
Concerning such "best unbiased estimators", see also Cramér–Rao bound, Gauss–Markov theorem, Lehmann–Scheffé theorem, Rao–Blackwell theorem.
See main article: Robust estimator.