Semiparametric model explained

In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components.

A statistical model is a parameterized family of distributions:

\{P\theta:\theta\in\Theta\}

indexed by a parameter

\theta

.

\theta

is a vector in

k

-dimensional Euclidean space, for some nonnegative integer

k

.[1] Thus,

\theta

is finite-dimensional, and

\Theta\subseteqRk

.

\theta

is a subset of some space

V

, which is not necessarily finite-dimensional. For example, we might consider the set of all distributions with mean 0. Such spaces are vector spaces with topological structure, but may not be finite-dimensional as vector spaces. Thus,

\Theta\subseteqV

for some possibly infinite-dimensional space

V

.

\Theta\subseteqRk x V

, where

V

is an infinite-dimensional space.

It may appear at first that semiparametric models include nonparametric models, since they have an infinite-dimensional as well as a finite-dimensional component. However, a semiparametric model is considered to be "smaller" than a completely nonparametric model because we are often interested only in the finite-dimensional component of

\theta

. That is, the infinite-dimensional component is regarded as a nuisance parameter.[2] In nonparametric models, by contrast, the primary interest is in estimating the infinite-dimensional parameter. Thus the estimation task is statistically harder in nonparametric models.

These models often use smoothing or kernels.

Example

A well-known example of a semiparametric model is the Cox proportional hazards model.[3] If we are interested in studying the time

T

to an event such as death due to cancer or failure of a light bulb, the Cox model specifies the following distribution function for

T

:

F(t)=1-

t
\exp\left(-\int
0

λ0(u)e\betadu\right),

where

x

is the covariate vector, and

\beta

and

λ0(u)

are unknown parameters.

\theta=(\beta,λ0(u))

. Here

\beta

is finite-dimensional and is of interest;

λ0(u)

is an unknown non-negative function of time (known as the baseline hazard function) and is often a nuisance parameter. The set of possible candidates for

λ0(u)

is infinite-dimensional.

See also

References

Notes and References

  1. .
  2. .
  3. Book: N. . Balakrishnan. C. R. . Rao . C. R. Rao. Handbook of Statistics 23: Advances in Survival Analysis. 2004 . Elsevier. 126.