Generalized Pareto distribution explained

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location

\mu

, scale

\sigma

, and shape

\xi

.^[1] ^[2] Sometimes it is specified by only scale and shape^[3] and sometimes only by its shape parameter. Some references give the shape parameter as

\kappa=-\xi

.^[4]

Definition

The standard cumulative distribution function (cdf) of the GPD is defined by^[5]

F_\xi(z)=\begin{cases} 1-\left(1+\xiz\right)^-1/\xi&for\xi ≠ 0,\\ 1-e^-z&for\xi=0. \end{cases}

where the support is

z\geq0

for

\xi\geq0

and

0\leqz\leq-1/\xi

for

\xi<0

. The corresponding probability density function (pdf) is

f_\xi(z)=\begin{cases} (1+\xi

-	\xi+1
	\xi

&for\xi ≠ 0,\\ e^-z&for\xi=0. \end{cases}

Characterization

The related location-scale family of distributions is obtained by replacing the argument z by

	x-\mu
	\sigma

and adjusting the support accordingly.

The cumulative distribution function of

X\simGPD(\mu,\sigma,\xi)

(

\mu\inR

\sigma>0

, and

\xi\inR

) is

F_{(\mu,\sigma,\xi)}(x)=\begin{cases} 1-\left(1+

	\xi(x-\mu)
	\sigma

\right)^-1/\xi&for\xi ≠ 0,\\ 1-\exp\left(-

	x-\mu
	\sigma

\right)&for\xi=0, \end{cases}

where the support of

x\geqslant\mu

when

\xi\geqslant0

, and

\mu\leqslantx\leqslant\mu-\sigma/\xi

when

\xi<0

The probability density function (pdf) of

X\simGPD(\mu,\sigma,\xi)

f_{(\mu,\sigma,\xi)}(x)=

	1
	\sigma

\left(1+

	\xi(x-\mu)
	\sigma

\left(-	1	-1\right)
	\xi

\right)

again, for

x\geqslant\mu

when

\xi\geqslant0

, and

\mu\leqslantx\leqslant\mu-\sigma/\xi

when

\xi<0

The pdf is a solution of the following differential equation:

\left\{\begin{array}{l} f'(x)(-\mu\xi+\sigma+\xix)+(\xi+1)f(x)=0,\\ f(0)=

\left(1-

\mu\xi

-	1	-1
	\xi

\right)

\sigma

\end{array}\right\}

Special cases

If the shape

\xi

and location

\mu

are both zero, the GPD is equivalent to the exponential distribution.

With shape

\xi=-1

, the GPD is equivalent to the continuous uniform distribution

U(0,\sigma)

.^[6]

With shape

\xi>0

and location

\mu=\sigma

, the GPD is equivalent to the Pareto distribution with scale

x_m=\sigma/\xi

and shape

\alpha=1/\xi

\sim

GPD

(

\mu=0

\sigma

\xi

)

, then

Y=log(X)\simexGPD(\sigma,\xi)

https://www.tandfonline.com/doi/abs/10.1080/03610926.2018.1441418. (exGPD stands for the exponentiated generalized Pareto distribution.)

GPD is similar to the Burr distribution.

Generating generalized Pareto random variables

Generating GPD random variables

If U is uniformly distributed on(0, 1], then

X=\mu+

	\sigma(U^-\xi-1)
	\xi

\simGPD(\mu,\sigma,\xi ≠ 0)

and

X=\mu-\sigmaln(U)\simGPD(\mu,\sigma,\xi=0).

Both formulas are obtained by inversion of the cdf.

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture

A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.

X|Λ\sim\operatorname{Exp}(Λ)

and

Λ\sim\operatorname{Gamma}(\alpha,\beta)

then

X\sim\operatorname{GPD}(\xi=1/\alpha, \sigma=\beta/\alpha)

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that:

\xi

must be positive.

In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for

Y\simExponential(1)

and

Z\simGamma(1/\xi,1)

, we have

\mu+\sigma

	Y
	\xiZ

\simGPD(\mu,\sigma,\xi)

. This is a consequence of the mixture after setting

\beta=\alpha

and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.

Exponentiated generalized Pareto distribution

The exponentiated generalized Pareto distribution (exGPD)

X\simGPD

(

\mu=0

\sigma

\xi

)

, then

Y=log(X)

is distributed according to the exponentiated generalized Pareto distribution, denoted by

\sim

exGPD

(

\sigma

\xi

)

The probability density function(pdf) of

\sim

exGPD

(

\sigma

\xi

)(\sigma>0)

g_(\sigma,(y)=\begin{cases}

	e^y
	\sigma

(1+

	\xie^y
	\sigma

)^-1/\xifor\xi ≠ 0,\

	1
	\sigma

	y-e^y/\sigma
e

for\xi=0,\end{cases}

where the support is

-infty<y<infty

for

\xi\geq0

, and

-infty<y\leqlog(-\sigma/\xi)

for

\xi<0

For all

\xi

, the

log\sigma

becomes the location parameter. See the right panel for the pdf when the shape

\xi

is positive.

The exGPD has finite moments of all orders for all

\sigma>0

and

-infty<\xi<infty

The moment-generating function of

Y\simexGPD(\sigma,\xi)

M_Y(s)=E[e^sY]=\begin{cases}-

	1	(-
	\xi

	\sigma
	\xi

)^sB(s+1,-1/\xi)fors\in(-1,infty),\xi<0,\

	1	(
	\xi

	\sigma
	\xi

)^sB(s+1,1/\xi-s)fors\in(-1,1/\xi),\xi>0,\ \sigma^s\Gamma(1+s)fors\in(-1,infty),\xi=0,\end{cases}

where

B(a,b)

and

\Gamma(a)

denote the beta function and gamma function, respectively.

The expected value of

\sim

exGPD

(

\sigma

\xi

)

depends on the scale

\sigma

and shape

\xi

parameters, while the

\xi

participates through the digamma function:

E[Y]=\begin{cases}log (-

	\sigma
	\xi

)+\psi(1)-\psi(-1/\xi+1)for\xi<0,\ log (

	\sigma
	\xi

)+\psi(1)-\psi(1/\xi)for\xi>0,\ log\sigma+\psi(1)for\xi=0.\end{cases}

Note that for a fixed value for the

\xi\in(-infty,infty)

, the

log \sigma

plays as the location parameter under the exponentiated generalized Pareto distribution.

The variance of

\sim

exGPD

(

\sigma

\xi

)

depends on the shape parameter

\xi

only through the polygamma function of order 1 (also called the trigamma function):

Var[Y]=\begin{cases}\psi'(1)-\psi'(-1/\xi+1)for\xi<0,\ \psi'(1)+\psi'(1/\xi)for\xi>0,\ \psi'(1)for\xi=0.\end{cases}

See the right panel for the variance as a function of

\xi

. Note that

\psi'(1)=\pi^2/6 ≈ 1.644934

Note that the roles of the scale parameter

\sigma

and the shape parameter

\xi

under

Y\simexGPD(\sigma,\xi)

are separably interpretable, which may lead to a robust efficient estimation for the

\xi

than using the

X\simGPD(\sigma,\xi)

https://www.tandfonline.com/doi/abs/10.1080/03610926.2018.1441418. The roles of the two parameters are associated each other under

X\simGPD(\mu=0,\sigma,\xi)

(at least up to the second central moment); see the formula of variance

Var(X)

wherein both parameters are participated.

The Hill's estimator

Assume that

X_1:n=(X_1, … ,X_n)

are

observations (need not be i.i.d.) from an unknown heavy-tailed distribution

such that its tail distribution is regularly varying with the tail-index

1/\xi

(hence, the corresponding shape parameter is

\xi

). To be specific, the tail distribution is described as

\bar{F}(x)=1-F(x)=L(x) ⋅ x^-1/\xi,forsome\xi>0,whereLisaslowlyvaryingfunction.

It is of a particular interest in the extreme value theory to estimate the shape parameter

\xi

, especially when

\xi

is positive (so called the heavy-tailed distribution).

Let

F_u

be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions

, and large

F_u

is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate

\xi

: the GPD plays the key role in POT approach.

A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For

1\leqi\leqn

, write

X_(i)

for the

-th largest value of

X_1, … ,X_n

. Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al https://books.google.com/books?id=o-clBQAAQBAJ&dq=modeeling+extreme+events+for+insurance&pg=PA1) based on the

upper order statistics is defined as

	Hill
\widehat{\xi}
	k

	Hill
\widehat{\xi}
	k

(X_1:n)=

	1
	k-1

	k-1
\sum
	j=1

log(

	X_(j)
	X_(k)

),for2\leqk\leqn.

In practice, the Hill estimator is used as follows. First, calculate the estimator

	Hill
\widehat{\xi}
	k

at each integer

k\in\{2, … ,n\}

, and then plot the ordered pairs

	Hill
\{(k,\widehat{\xi}
	k

	n
)\}
	k=2

. Then, select from the set of Hill estimators

	Hill
\{\widehat{\xi}
	k

	n
\}
	k=2

which are roughly constant with respect to

: these stable values are regarded as reasonable estimates for the shape parameter

\xi

. If

X_1, … ,X_n

are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter

\xi

https://www.jstor.org/stable/1427870.

Note that the Hill estimator

	Hill
\widehat{\xi}
	k

makes a use of the log-transformation for the observations

X_1:n=(X_1, … ,X_n)

. (The Pickand's estimator

	Pickand
\widehat{\xi}
	k

also employed the log-transformation, but in a slightly different wayhttps://www.jstor.org/stable/2242785.)

External links

Mathworks: Generalized Pareto distribution

Notes and References

Book: Coles, Stuart . An Introduction to Statistical Modeling of Extreme Values . Springer . 75 . 9781852334598 . 2001-12-12.
Dargahi-Noubary . G. R. . On tail estimation: An improved method . 10.1007/BF00894450 . Mathematical Geology . 21 . 8 . 829–842 . 1989 . 1989MatGe..21..829D . 122710961 .
Hosking . J. R. M. . Wallis . J. R. . Parameter and Quantile Estimation for the Generalized Pareto Distribution . Technometrics . 29 . 3 . 339–349 . 10.2307/1269343 . 1987 . 1269343 .
Book: Davison, A. C. . Statistical Extremes and Applications . de Oliveira . J. Tiago . Kluwer . Modelling Excesses over High Thresholds, with an Application . 462 . https://books.google.com/books?id=6M03_6rm8-oC&pg=PA462 . 9789027718044 . 1984-09-30.
Book: Embrechts . Paul . Klüppelberg . Claudia. Claudia Klüppelberg . Mikosch . Thomas . Modelling extremal events for insurance and finance . 162 . 9783540609315 . 1997-01-01. Springer .
Castillo, Enrique, and Ali S. Hadi. "Fitting the generalized Pareto distribution to data." Journal of the American Statistical Association 92.440 (1997): 1609-1620.

Generalized Pareto distribution explained

Definition

Characterization

Special cases

Generating generalized Pareto random variables

Generating GPD random variables

GPD as an Exponential-Gamma Mixture

Exponentiated generalized Pareto distribution

The exponentiated generalized Pareto distribution (exGPD)

The Hill's estimator

See also

Further reading

External links

Notes and References