Generalized Pareto distribution explained
In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location
, scale
, and shape
.
[1] [2] Sometimes it is specified by only scale and shape
[3] and sometimes only by its shape parameter. Some references give the shape parameter as
.
[4] Definition
The standard cumulative distribution function (cdf) of the GPD is defined by[5]
F\xi(z)=\begin{cases}
1-\left(1+\xiz\right)-1/\xi&for\xi ≠ 0,\\
1-e-z&for\xi=0.
\end{cases}
where the support is
for
and
for
. The corresponding probability density function (pdf) is
f\xi(z)=\begin{cases}
(1+\xi
&for\xi ≠ 0,\\
e-z&for\xi=0.
\end{cases}
Characterization
The related location-scale family of distributions is obtained by replacing the argument z by
and adjusting the support accordingly.
The cumulative distribution function of
(
,
, and
) is
F(\mu,\sigma,\xi)(x)=\begin{cases}
1-\left(1+
\right)-1/\xi&for\xi ≠ 0,\\
1-\exp\left(-
\right)&for\xi=0,
\end{cases}
where the support of
is
when
, and
\mu\leqslantx\leqslant\mu-\sigma/\xi
when
.
The probability density function (pdf) of
is
f(\mu,\sigma,\xi)(x)=
\left(1+
,
again, for
when
, and
\mu\leqslantx\leqslant\mu-\sigma/\xi
when
.
The pdf is a solution of the following differential equation:
\left\{\begin{array}{l}
f'(x)(-\mu\xi+\sigma+\xix)+(\xi+1)f(x)=0,\\
f(0)=
\end{array}\right\}
Special cases
and location
are both zero, the GPD is equivalent to the
exponential distribution.
, the GPD is equivalent to the
continuous uniform distribution
.
[6]
and location
, the GPD is equivalent to the
Pareto distribution with scale
and shape
.
,
,
, then
Y=log(X)\simexGPD(\sigma,\xi)
https://www.tandfonline.com/doi/abs/10.1080/03610926.2018.1441418. (exGPD stands for the exponentiated generalized Pareto distribution.)
Generating generalized Pareto random variables
Generating GPD random variables
If U is uniformly distributed on(0, 1], then
X=\mu+
\simGPD(\mu,\sigma,\xi ≠ 0)
and
X=\mu-\sigmaln(U)\simGPD(\mu,\sigma,\xi=0).
Both formulas are obtained by inversion of the cdf.
In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.
GPD as an Exponential-Gamma Mixture
A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.
X|Λ\sim\operatorname{Exp}(Λ)
and
Λ\sim\operatorname{Gamma}(\alpha,\beta)
then
X\sim\operatorname{GPD}(\xi=1/\alpha, \sigma=\beta/\alpha)
Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that:
must be positive.
In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for
and
, we have
\mu+\sigma
\simGPD(\mu,\sigma,\xi)
. This is a consequence of the mixture after setting
and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.
Exponentiated generalized Pareto distribution
The exponentiated generalized Pareto distribution (exGPD)
If
,
,
, then
is distributed according to the
exponentiated generalized Pareto distribution, denoted by
,
.
The probability density function(pdf) of
,
is
g(\sigma,(y)=\begin{cases}
(1+
)-1/\xifor\xi ≠ 0,\
for\xi=0,\end{cases}
where the support is
for
, and
-infty<y\leqlog(-\sigma/\xi)
for
.
For all
, the
becomes the location parameter. See the right panel for the pdf when the shape
is positive.
The exGPD has finite moments of all orders for all
and
.
The moment-generating function of
is
MY(s)=E[esY]=\begin{cases}-
)sB(s+1,-1/\xi)fors\in(-1,infty),\xi<0,\
)sB(s+1,1/\xi-s)fors\in(-1,1/\xi),\xi>0,\
\sigmas\Gamma(1+s)fors\in(-1,infty),\xi=0,\end{cases}
where
and
denote the
beta function and
gamma function, respectively.
The expected value of
,
depends on the scale
and shape
parameters, while the
participates through the
digamma function:
E[Y]=\begin{cases}log (-
)+\psi(1)-\psi(-1/\xi+1)for\xi<0,\
log (
)+\psi(1)-\psi(1/\xi)for\xi>0,\
log\sigma+\psi(1)for\xi=0.\end{cases}
Note that for a fixed value for the
, the
plays as the location parameter under the exponentiated generalized Pareto distribution.
The variance of
,
depends on the shape parameter
only through the
polygamma function of order 1 (also called the
trigamma function):
Var[Y]=\begin{cases}\psi'(1)-\psi'(-1/\xi+1)for\xi<0,\
\psi'(1)+\psi'(1/\xi)for\xi>0,\
\psi'(1)for\xi=0.\end{cases}
See the right panel for the variance as a function of
. Note that
\psi'(1)=\pi2/6 ≈ 1.644934
.
Note that the roles of the scale parameter
and the shape parameter
under
are separably interpretable, which may lead to a robust efficient estimation for the
than using the
https://www.tandfonline.com/doi/abs/10.1080/03610926.2018.1441418. The roles of the two parameters are associated each other under
X\simGPD(\mu=0,\sigma,\xi)
(at least up to the second central moment); see the formula of variance
wherein both parameters are participated.
The Hill's estimator
Assume that
are
observations (need not be i.i.d.) from an unknown
heavy-tailed distribution
such that its tail distribution is regularly varying with the tail-index
(hence, the corresponding shape parameter is
). To be specific, the tail distribution is described as
\bar{F}(x)=1-F(x)=L(x) ⋅ x-1/\xi,forsome\xi>0,whereLisaslowlyvaryingfunction.
It is of a particular interest in the
extreme value theory to estimate the shape parameter
, especially when
is positive (so called the heavy-tailed distribution).
Let
be their conditional excess distribution function.
Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions
, and large
,
is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate
:
the GPD plays the key role in POT approach.A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For
, write
for the
-th largest value of
. Then, with this notation, the
Hill's estimator (see page 190 of Reference 5 by Embrechts et al
https://books.google.com/books?id=o-clBQAAQBAJ&dq=modeeling+extreme+events+for+insurance&pg=PA1) based on the
upper order statistics is defined as
=
(X1:n)=
log(
),for2\leqk\leqn.
In practice, the Hill estimator is used as follows. First, calculate the estimator
at each integer
, and then plot the ordered pairs
. Then, select from the set of Hill estimators
which are roughly constant with respect to
: these stable values are regarded as reasonable estimates for the shape parameter
. If
are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter
https://www.jstor.org/stable/1427870.
Note that the Hill estimator
makes a use of the log-transformation for the observations
. (The
Pickand's estimator
also employed the log-transformation, but in a slightly different way
https://www.jstor.org/stable/2242785.)
See also
Further reading
- Pickands. James. Statistical inference using extreme order statistics. Annals of Statistics. 3 s. 1975. 119–131. 10.1214/aos/1176343003. free.
- Balkema . A. . Residual life time at great age . Annals of Probability . 2 . 1974 . 792–804 . 10.1214/aop/1176996548 . Laurens . De Haan . 5 . Laurens de Haan . free .
- Lee. Seyoon . Exponentiated generalized Pareto distribution:Properties and applications towards extreme value theory . Communications in Statistics - Theory and Methods. 2018. 1–25 . 10.1080/03610926.2018.1441418 . J.H.K. . Kim. 48 . 8 . 1708.01686 . 88514574 .
- Book: Continuous Univariate Distributions Volume 1, second edition. N. L. Johnson . S. Kotz . N. Balakrishnan . Wiley. New York. 1994. 978-0-471-58495-7. Chapter 20, Section 12: Generalized Pareto Distributions.
- Book: Duangkamon Chotikapanich. 2011. Modeling Distributions and Lorenz Curves. Springer. New York. Barry C. Arnold. Chapter 7: Pareto and Generalized Pareto Distributions. https://books.google.com/books?id=fUJZZLj1kbwC&pg=PA119. 9780387727967.
- Book: Arnold, B. C. . Laguna, L.. 1977. On generalized Pareto distributions with applications to income data. Ames, Iowa. Iowa State University, Department of Economics.
External links
Notes and References
- Book: Coles, Stuart . An Introduction to Statistical Modeling of Extreme Values . Springer . 75 . 9781852334598 . 2001-12-12.
- Dargahi-Noubary . G. R. . On tail estimation: An improved method . 10.1007/BF00894450 . Mathematical Geology . 21 . 8 . 829–842 . 1989 . 1989MatGe..21..829D . 122710961 .
- Hosking . J. R. M. . Wallis . J. R. . Parameter and Quantile Estimation for the Generalized Pareto Distribution . Technometrics . 29 . 3 . 339–349 . 10.2307/1269343 . 1987 . 1269343 .
- Book: Davison, A. C. . Statistical Extremes and Applications . de Oliveira . J. Tiago . Kluwer . Modelling Excesses over High Thresholds, with an Application . 462 . https://books.google.com/books?id=6M03_6rm8-oC&pg=PA462 . 9789027718044 . 1984-09-30.
- Book: Embrechts . Paul . Klüppelberg . Claudia. Claudia Klüppelberg . Mikosch . Thomas . Modelling extremal events for insurance and finance . 162 . 9783540609315 . 1997-01-01. Springer .
- Castillo, Enrique, and Ali S. Hadi. "Fitting the generalized Pareto distribution to data." Journal of the American Statistical Association 92.440 (1997): 1609-1620.