Semiparametric model explained
In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components.
A statistical model is a parameterized family of distributions:
\{P\theta:\theta\in\Theta\}
indexed by a
parameter
.
is a vector in
-dimensional
Euclidean space, for some nonnegative integer
.
[1] Thus,
is finite-dimensional, and
.
- With a nonparametric model, the set of possible values of the parameter
is a subset of some space
, which is not necessarily finite-dimensional. For example, we might consider the set of all distributions with mean 0. Such spaces are
vector spaces with topological structure, but may not be finite-dimensional as vector spaces. Thus,
for some possibly
infinite-dimensional space
.
- With a semiparametric model, the parameter has both a finite-dimensional component and an infinite-dimensional component (often a real-valued function defined on the real line). Thus,
, where
is an infinite-dimensional space.
It may appear at first that semiparametric models include nonparametric models, since they have an infinite-dimensional as well as a finite-dimensional component. However, a semiparametric model is considered to be "smaller" than a completely nonparametric model because we are often interested only in the finite-dimensional component of
. That is, the infinite-dimensional component is regarded as a
nuisance parameter.
[2] In nonparametric models, by contrast, the primary interest is in estimating the infinite-dimensional parameter. Thus the estimation task is statistically harder in nonparametric models.
These models often use smoothing or kernels.
Example
A well-known example of a semiparametric model is the Cox proportional hazards model.[3] If we are interested in studying the time
to an event such as death due to cancer or failure of a light bulb, the Cox model specifies the following distribution function for
:
F(t)=1-
λ0(u)e\betadu\right),
where
is the covariate vector, and
and
are unknown parameters.
. Here
is finite-dimensional and is of interest;
is an unknown non-negative function of time (known as the baseline hazard function) and is often a
nuisance parameter. The set of possible candidates for
is infinite-dimensional.
See also
References
- Begun, Janet M.; Hall, W. J.; Huang, Wei-Min; Wellner, Jon A. (1983), "Information and asymptotic efficiency in parametric--nonparametric models", Annals of Statistics, 11 (1983), no. 2, 432--452
Notes and References
- .
- .
- Book: N. . Balakrishnan. C. R. . Rao . C. R. Rao. Handbook of Statistics 23: Advances in Survival Analysis. 2004 . Elsevier. 126.