In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.
In practice one constructs an estimator as a function of an available sample of size n, and then imagines being able to keep collecting data and expanding the sample ad infinitum. In this way one would obtain a sequence of estimates indexed by n, and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value θ0, it is called a consistent estimator; otherwise the estimator is said to be inconsistent.
Consistency as defined here is sometimes referred to as weak consistency. When we replace convergence in probability with almost sure convergence, then the estimator is said to be strongly consistent. Consistency is related to bias; see bias versus consistency.
Formally speaking, an estimator Tn of parameter θ is said to be weakly consistent, if it converges in probability to the true value of the parameter:
\underset{n\toinfty}{\operatorname{plim}} Tn=\theta.
\limn\toinfty\Pr(|Tn-\theta|>\varepsilon)=0.
An estimator Tn of parameter θ is said to be strongly consistent, if it converges almost surely to the true value of the parameter:
\Pr(\limn\toinftyTn=\theta)=1.
A more rigorous definition takes into account the fact that θ is actually unknown, and thus, the convergence in probability must take place for every possible value of this parameter. Suppose is a family of distributions (the parametric model), and is an infinite sample from the distribution pθ. Let be a sequence of estimators for some parameter g(θ). Usually, Tn will be based on the first n observations of a sample. Then this sequence is said to be (weakly) consistent if
\theta | |
\underset{n\toinfty}{\operatorname{plim}} T | |
n(X |
)=g(\theta), forall \theta\in\Theta.
This definition uses g(θ) instead of simply θ, because often one is interested in estimating a certain function or a sub-vector of the underlying parameter. In the next example, we estimate the location parameter of the model, but not the scale:
Suppose one has a sequence of statistically independent observations from a normal N(μ, σ2) distribution. To estimate μ based on the first n observations, one can use the sample mean: Tn = (X1 + ... + Xn)/n. This defines a sequence of estimators, indexed by the sample size n.
From the properties of the normal distribution, we know the sampling distribution of this statistic: Tn is itself normally distributed, with mean μ and variance σ2/n. Equivalently, has a standard normal distribution:
\Pr\left[|Tn-\mu|\geq\varepsilon\right]=\Pr\left[
\sqrt{n | |
|T |
n-\mu|}{\sigma}\geq\sqrt{n}\varepsilon/\sigma\right]=2\left(1-\Phi\left(
\sqrt{n | |
\varepsilon}{\sigma}\right)\right) |
\to0
\Phi
The notion of asymptotic consistency is very close, almost synonymous to the notion of convergence in probability. As such, any theorem, lemma, or property which establishes convergence in probability may be used to prove the consistency. Many such tools exist:
\Pr[h(Tn-\theta)\geq\varepsilon]\leq
\operatorname{E | |
[h(T |
n-\theta)]}{h(\varepsilon)},
Tn \xrightarrow{p} \theta ⇒ g(Tn) \xrightarrow{p} g(\theta)
\begin{align} &Tn+Sn \xrightarrow{d} \alpha+\beta,\\ &TnSn \xrightarrow{d} \alpha\beta,\\ &Tn/Sn \xrightarrow{d} \alpha/\beta,providedthat\beta ≠ 0 \end{align}
1 | |
n |
n | |
\sum | |
i=1 |
g(Xi) \xrightarrow{p} \operatorname{E}[g(X)]
An estimator can be unbiased but not consistent. For example, for an iid sample one can use T(X) = x as the estimator of the mean E[''X'']. Note that here the sampling distribution of T is the same as the underlying distribution (for any n, as it ignores all points but the last), so E[''T{{su|b=n}}''(''X'')] = E[''X''] and it is unbiased, but it does not converge to any value.
However, if a sequence of estimators is unbiased and converges to a value, then it is consistent, as it must converge to the correct value.
Alternatively, an estimator can be biased but consistent. For example, if the mean is estimated by
{1\overn}\sumxi+{1\overn}
n → infty
Important examples include the sample variance and sample standard deviation. Without Bessel's correction (that is, when using the sample size
n
n-1
Here is another example. Let
Tn
\theta
\Pr(Tn)=\begin{cases} 1-1/n,&ifTn=\theta\\ 1/n,&ifTn=n\delta+\theta \end{cases}
We can see that
Tn\xrightarrow{p}\theta
\operatorname{E}[Tn]=\theta+\delta
. Takeshi Amemiya . Advanced Econometrics . 1985 . . 0-674-00560-0 . registration .