Geometric distribution explained

Geometric

Type:

mass

Parameters:

0<p\leq1

success probability (real)

Support:

k trials where

k\inN=\{1,2,3,...c\}

Pdf:

(1-p)^k-1p

Cdf:

1-(1-p)^\lfloor

for

x\geq1

for

x<1

Mean:

	1
	p

Median:

\left\lceil

	-1
	log_2(1-p)

\right\rceil

(not unique if

-1/log_2(1-p)

is an integer)

Mode:

Variance:

	1-p
	p²

Skewness:

	2-p
	\sqrt{1-p

}

Kurtosis:

6+	p²
	1-p

Entropy:

\tfrac{-(1-p)log(1-p)-plogp}{p}

Mgf:

	pe^t
	1-(1-p)e^t

for

t<-ln(1-p)

Char:

	pe^it
	1-(1-p)e^it

Pgf:

	pz
	1-(1-p)z

Parameters2:

0<p\leq1

success probability (real)

Support2:

k failures where

k\inN₀=\{0,1,2,...c\}

Pdf2:

(1-p)^kp

Cdf2:

1-(1-p)^\lfloor

for

x\geq0

for

x<0

Mean2:

	1-p
	p

Median2:

\left\lceil

	-1
	log_2(1-p)

\right\rceil-1

(not unique if

-1/log_2(1-p)

is an integer)

Mode2:

Variance2:

	1-p
	p²

Skewness2:

	2-p
	\sqrt{1-p

}

Kurtosis2:

6+	p²
	1-p

Entropy2:

\tfrac{-(1-p)log(1-p)-plogp}{p}

Mgf2:

	p
	1-(1-p)e^t

for

t<-ln(1-p)

Char2:

	p
	1-(1-p)e^it

Pgf2:

	p
	1-(1-p)z

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

The probability distribution of the number

of Bernoulli trials needed to get one success, supported on

N=\{1,2,3,\ldots\}

;

The probability distribution of the number

Y=X-1

of failures before the first success, supported on

N₀=\{0,1,2,\ldots\}

These two different geometric distributions should not be confused with each other. Often, the name shifted geometric distribution is adopted for the former one (distribution of

); however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.

The geometric distribution gives the probability that the first occurrence of success requires

independent trials, each with success probability

. If the probability of success on each trial is

, then the probability that the

-th trial is the first success is

\Pr(X=k)=(1-p)^k-1p

for

k=1,2,3,4,...

The above form of the geometric distribution is used for modeling the number of trials up to and including the first success. By contrast, the following form of the geometric distribution is used for modeling the number of failures until the first success:

\Pr(Y=k)=\Pr(X=k+1)=(1-p)^kp

for

k=0,1,2,3,...

The geometric distribution gets its name because its probabilities follow a geometric sequence. It is sometimes called the Furry distribution after Wendell H. Furry.

Definition

The geometric distribution is the discrete probability distribution that describes when the first success in an infinite sequence of independent and identically distributed Bernoulli trials occurs. Its probability mass function depends on its parameterization and support. When supported on

, the probability mass function is

P(X = k) = (1 - p)^ p

where

k=1,2,3,...c

is the number of trials and

is the probability of success in each trial.^[1]

The support may also be

N₀

, defining

Y=X-1

. This alters the probability mass function into

P(Y = k) = (1 - p)^k p

where

k=0,1,2,...c

is the number of failures before the first success.^[2]

An alternative parameterization of the distribution gives the probability mass function $P(Y = k) = \left(\frac\right)^k \left(1-\frac\right)$ where

	1-p
	p

and

	1
	p

An example of a geometric distribution arises from rolling a six-sided die until a "1" appears. Each roll is independent with a

1/6

chance of success. The number of rolls needed follows a geometric distribution with

p=1/6

Properties

Memorylessness

See main article: article and Memorylessness. The geometric distribution is the only memoryless discrete probability distribution.^[3] It is the discrete version of the same property found in the exponential distribution.^[4] The property asserts that the number of previously failed trials does not affect the number of future trials needed for a success.

Because there are two definitions of the geometric distribution, there are also two definitions of memorylessness for discrete random variables.^[5] Expressed in terms of conditional probability, the two definitions are $\Pr(X>m+n\mid X>n)=\Pr(X>m),$

and $\Pr(Y>m+n\mid Y\geq n)=\Pr(Y>m),$

where

and

are natural numbers,

is a geometrically distributed random variable defined over

, and

is a geometrically distributed random variable defined over

N₀

. Note that these definitions are not equivalent for discrete random variables;

does not satisfy the first equation and

does not satisfy the second.

Moments and cumulants

defined over

\operatorname(X) = \frac, \qquad\operatorname(X) = \frac.

When a geometrically distributed random variable

defined over

N₀

, the expected value changes into

\operatorname(Y) = \frac p,

while the variance stays the same.^[6]

For example, when rolling a six-sided die until landing on a "1", the average number of rolls needed is

	1
	1/6

and the average number of failures is

	1-1/6
	1/6

The moment generating function of the geometric distribution when defined over

and

N₀

respectively is^[7]

\beginM_X(t) &= \frac \\M_Y(t) &= \frac, t < -\ln(1-p)\end

The moments for the number of failures before the first success are given by

\begin{align} E(Yⁿ⁾&{}

	infty
=\sum
	k=0

(1-p)^kp ⋅ kⁿ\\ &{}=p\operatorname{Li}_-n(1-p)&(forn ≠ 0) \end{align}

where

\operatorname{Li}_-n(1-p)

is the polylogarithm function.^[8]

The cumulant generating function of the geometric distribution defined over

N₀

K(t) = \ln p - \ln (1 - (1-p)e^t)

The cumulants

\kappa_r

satisfy the recursion

\kappa_ = q \frac, r=1,2,\dotsc

where

q=1-p

, when defined over

N₀

Proof of expected value

Consider the expected value

E(X)

of X as above, i.e. the average number of trials until a success. On the first trial, we either succeed with probability

, or we fail with probability

1-p

. If we fail the remaining mean number of trials until a success is identical to the original mean.This follows from the fact that all trials are independent.From this we get the formula:

\operatornameE(X)=p ⋅ 1+(1-p) ⋅ (1+E(X)),

which, if solved for

E(X)

, gives:

\operatornameE(X)=

	1
	p

The expected value of

can be found from the linearity of expectation,

E(Y)=E(X-1)=E(X)-1=

	1
	p

-1=

	1-p
	p

. It can also be shown in the following way:

\begin{align} \operatornameE(Y)&{}

	infty
=\sum
	k=0

(1-p)^kp ⋅ k\\ &{}

	infty(1-p)
=p\sum
	k=0

^kk\\ &{}=p(1-p)

	infty
\sum
	k=0

(1-p)^k-1 ⋅ k\\ &{}=p(1-p)\left[

	d
	dp

	infty
\left(-\sum
	k=0

(1-p)^{k\right)\right]}\\ &{}=p(1-p)

	d	\left(-
	dp

	1	\right)=
	p

	1-p
	p

. \end{align}

The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge.

Summary statistics

The mean of the geometric distribution is its expected value which is, as previously discussed in § Moments and cumulants,

	1
	p

	1-p
	p

when defined over

N₀

respectively.

The median of the geometric distribution is

\left\lceil-

	log2
	log(1-p)

\right\rceil

when defined over

^[9] and

\left\lfloor-	log2
	log(1-p)

\right\rfloor

when defined over

N₀

The mode of the geometric distribution is the first value in the support set. This is 1 when defined over

and 0 when defined over

N₀

The skewness of the geometric distribution is

	2-p
	\sqrt{1-p

The kurtosis of the geometric distribution is

	p²
	1-p

. The excess kurtosis of a distribution is the difference between its kurtosis and the kurtosis of a normal distribution,

.^[10] Therefore, the excess kurtosis of the geometric distribution is

	p²
	1-p

. Since

	p²
	1-p

\geq0

, the excess kurtosis is always positive so the distribution is leptokurtic. In other words, the tail of a geometric distribution decays faster than a Gaussian.

General properties

The probability generating functions of geometric random variables

and

defined over

and

N₀

are, respectively,

\begin{align} G_X(s)&=

	sp
	1-s(1-p)

,\\[10pt] G_Y(s)&=

	p
	1-s(1-p)

, |s|<(1-p)^-1. \end{align}

\varphi(t)

is equal to

G(e^it)

so the geometric distribution's characteristic function, when defined over

and

N₀

respectively, is^[11]

\begin\varphi_X(t) &= \frac,\\[10pt]\varphi_Y(t) &= \frac.\end

The entropy of a geometric distribution with parameter

-\frac

Given a mean, the geometric distribution is the maximum entropy probability distribution of all discrete probability distributions. The corresponding continuous distribution is the exponential distribution.^[12]
The geometric distribution defined on

N₀

is infinitely divisible, that is, for any positive integer

, there exist

independent identically distributed random variables whose sum is also geometrically distributed. This is because the negative binomial distribution can be derived from a Poisson-stopped sum of logarithmic random variables.

The decimal digits of the geometrically distributed random variable Y are a sequence of independent (and not identically distributed) random variables. For example, the hundreds digit D has this probability distribution:

\Pr(D=d)={q^100d\over1+q¹⁰⁰+q²⁰⁰+ … +q⁹⁰⁰

where q = 1 - p, and similarly for the other digits, and, more generally, similarly for numeral systems with other bases than 10. When the base is 2, this shows that a geometrically distributed random variable can be written as a sum of independent random variables whose probability distributions are indecomposable.

Golomb coding is the optimal prefix code for the geometric discrete distribution.^[13]

Related distributions

The sum of

independent geometric random variables with parameter

is a negative binomial random variable with parameters

and

.^[14] The geometric distribution is a special case of the negative binomial distribution, with

r=1

The geometric distribution is a special case of discrete compound Poisson distribution.
The minimum of

geometric random variables with parameters

p_1,...c,p_n

is also geometrically distributed with parameter

	n
\prod
	i=1

(1-p_i)

.^[15]

Suppose 0 < r < 1, and for k = 1, 2, 3, ... the random variable X_k has a Poisson distribution with expected value r^k/k. Then

	infty
\sum
	k=1

kX_k

has a geometric distribution taking values in

N₀

, with expected value r/(1 - r).

The exponential distribution is the continuous analogue of the geometric distribution. Applying the floor function to the exponential distribution with parameter

creates a geometric distribution with parameter

p=1-e^-λ

defined over

N₀

. This can be used to generate geometrically distributed random numbers as detailed in § Random variate generation.

If p = 1/n and X is geometrically distributed with parameter p, then the distribution of X/n approaches an exponential distribution with expected value 1 as n → ∞, since

\begin\Pr(X/n>a)=\Pr(X>na) & = (1-p)^ = \left(1-\frac 1 n \right)^ = \left[\left(1-\frac 1 n \right)^n \right]^ \\& \to [e^{-1}]^ = e^ \text n\to\infty.\endMore generally, if p = λ/n, where λ is a parameter, then as n→ ∞ the distribution of X/n approaches an exponential distribution with rate λ:

\Pr(X>nx)=\lim_n(1-λ/n)^nx=e^-λ

therefore the distribution function of X/n converges to

1-e^-λ

, which is that of an exponential random variable.

The index of dispersion of the geometric distribution is

	1
	p

and its coefficient of variation is

	1
	\sqrt{1-p

}. The distribution is overdispersed.

Statistical inference

The true parameter

of an unknown geometric distribution can be inferred through estimators and conjugate distributions.

Method of moments

Provided they exist, the first

moments of a probability distribution can be estimated from a sample

x_1,...c,x_n

using the formula

m_i = \frac \sum_^n x^i_j

where

m_i

is the

th sample moment and

1\leqi\leql

.^[16] Estimating

E(X)

with

m₁

gives the sample mean, denoted

\bar{x}

. Substituting this estimate in the formula for the expected value of a geometric distribution and solving for

gives the estimators

\hat{p}=

	1
	\bar{x

} and

\hat{p}=

	1
	\bar{x

+1}

when supported on

and

N₀

respectively. These estimators are biased since

E\left(	1
	\bar{x

}\right) > \frac = p as a result of Jensen's inequality.^[17]

Maximum likelihood estimation

The maximum likelihood estimator of

is the value that maximizes the likelihood function given a sample. By finding the zero of the derivative of the log-likelihood function when the distribution is defined over

, the maximum likelihood estimator can be found to be

\hat{p}=

	1
	\bar{x

}, where

\bar{x}

is the sample mean.^[18] If the domain is

N₀

, then the estimator shifts to

\hat{p}=

	1
	\bar{x

+1}

. As previously discussed in § Method of moments, these estimators are biased.

Regardless of the domain, the bias is equal to

b\equiv\operatorname{E}[ (\hatp_mle-p) ] =

	p(1-p)
	n

which yields the bias-corrected maximum likelihood estimator,

	*
\hat{p}
	mle

=\hat{p}_mle-\hat{b}

Bayesian inference

In Bayesian inference, the parameter

is a random variable from a prior distribution with a posterior distribution calculated using Bayes' theorem after observing samples. If a beta distribution is chosen as the prior distribution, then the posterior will also be a beta distribution and it is called the conjugate distribution. In particular, if a

Beta(\alpha,\beta)

prior is selected, then the posterior, after observing samples

k_1,...c,k_n\inN

, is^[19]

p \sim \mathrm\left(\alpha+n,\ \beta+\sum_^n (k_i-1)\right). \!

Alternatively, if the samples are in

N₀

, the posterior distribution is^[20]

p \sim \mathrm\left(\alpha+n,\beta+\sum_^n k_i\right).

Since the expected value of a

Beta(\alpha,\beta)

distribution is

	\alpha
	\alpha+\beta

, as

\alpha

and

\beta

approach zero, the posterior mean approaches its maximum likelihood estimate.

Random variate generation

The geometric distribution can be generated experimentally from i.i.d. standard uniform random variables by finding the first such random variable to be less than or equal to

. However, the number of random variables needed is also geometrically distributed and the algorithm slows as

decreases.^[21]

Random generation can be done in constant time by truncating exponential random numbers. An exponential random variable

can become geometrically distributed with parameter

through

\lceil-E/log(1-p)\rceil

. In turn,

can be generated from a standard uniform random variable

altering the formula into

\lceillog(U)/log(1-p)\rceil

.^[22]

Applications

The geometric distribution is used in many disciplines. In queueing theory, the M/M/1 queue has a steady state following a geometric distribution.^[23] In stochastic processes, the Yule Furry process is geometrically distributed.^[24] The distribution also arises when modeling the lifetime of a device in discrete contexts. It has also been used to fit data including modeling patients spreading COVID-19.^[25]

Notes and References

Book: Nagel . Werner . Probability and Conditional Expectation: Fundamentals for the Empirical Sciences . Steyer . Rolf . 2017-04-04 . Wiley . 978-1-119-24352-6 . 1st . Wiley Series in Probability and Statistics . en . 10.1002/9781119243496.
Book: Chattamvelli . Rajan . Discrete Distributions in Engineering and the Applied Sciences . Shanmugam . Ramalingam . Springer International Publishing . 2020 . 978-3-031-01297-6 . Synthesis Lectures on Mathematics & Statistics . Cham . en . 10.1007/978-3-031-02425-2.
Book: Dekking . Frederik Michel . A Modern Introduction to Probability and Statistics . Kraaikamp . Cornelis . Lopuhaä . Hendrik Paul . Meester . Ludolf Erwin . 2005 . Springer London . 978-1-85233-896-1 . Springer Texts in Statistics . London . 50 . en . 10.1007/1-84628-168-7.
Book: Johnson . Norman L. . Univariate Discrete Distributions . Kemp . Adrienne W. . Kotz . Samuel . 2005-08-19 . Wiley . 978-0-471-27246-5 . 1 . Wiley Series in Probability and Statistics . en . 10.1002/0471715816.
Web site: Weisstein . Eric W. . Memoryless . 2024-07-25 . mathworld.wolfram.com . en.
Book: Forbes . Catherine . Statistical Distributions . Evans . Merran . Hastings . Nicholas . Peacock . Brian . 2010-11-29 . Wiley . 978-0-470-39063-4 . 1st . en . 10.1002/9780470627242.
Book: Bertsekas, Dimitri P. . Introduction to probability . Tsitsiklis . John N. . Athena Scientific . 2008 . 978-1-886529-23-6 . 2nd . Optimization and computation series . Belmont . 235 . en.
Web site: Weisstein . Eric W. . Geometric Distribution . 2024-07-13 . . en.
Book: Aggarwal, Charu C. . Probability and Statistics for Machine Learning: A Textbook . Springer Nature Switzerland . 2024 . 978-3-031-53281-8 . Cham . 138 . en . 10.1007/978-3-031-53282-5.
Book: Chan, Stanley . Introduction to Probability for Data Science . . 2021 . 978-1-60785-747-1 . 1st . en.
Book: International Encyclopedia of Statistical Science . Springer Berlin Heidelberg . 2011 . 978-3-642-04897-5 . Lovric . Miodrag . 1st . Berlin, Heidelberg . en . 10.1007/978-3-642-04898-2.
Lisman . J. H. C. . Zuylen . M. C. A. van . March 1972 . Note on the generation of most probable frequency distributions . . en . 26 . 1 . 19–23 . 10.1111/j.1467-9574.1972.tb00152.x . 0039-0402.
Gallager. R.. van Voorhis. D.. March 1975. Optimal source codes for geometrically distributed integer alphabets (Corresp.). IEEE Transactions on Information Theory. 21. 2. 228–230. 10.1109/TIT.1975.1055357. 0018-9448.
Book: Pitman, Jim . Probability . 1993 . Springer New York . 978-0-387-94594-1 . New York, NY . 372 . en . 10.1007/978-1-4612-4374-8.
Ciardo . Gianfranco . Leemis . Lawrence M. . Nicol . David . 1 June 1995 . On the minimum of independent geometrically distributed random variables . Statistics & Probability Letters . en . 23 . 4 . 313–326 . 10.1016/0167-7152(94)00130-Z . 1505801 . free . 2060/19940028569.
Book: Evans . Michael . Probability and Statistics: The Science of Uncertainty . Rosenthal . Jeffrey . 2023 . 978-1429224628 . 2nd . Macmillan Learning . en.
Book: Held . Leonhard . Likelihood and Bayesian Inference: With Applications in Biology and Medicine . Sabanés Bové . Daniel . 2020 . Springer Berlin Heidelberg . 978-3-662-60791-6 . Statistics for Biology and Health . Berlin, Heidelberg . en . 10.1007/978-3-662-60792-3.
Web site: Siegrist . Kyle . 2020-05-05 . 7.3: Maximum Likelihood . 2024-06-20 . Statistics LibreTexts . en.
10.1.1.157.5540 . Daniel . Fink . A Compendium of Conjugate Priors.
Web site: 3. Conjugate families of distributions. https://web.archive.org/web/20100408092905/http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf . 2010-04-08 . live.
Book: Devroye, Luc . Non-Uniform Random Variate Generation . Springer New York . 1986 . 978-1-4613-8645-2 . New York, NY . en . 10.1007/978-1-4613-8643-8.
Book: Knuth, Donald Ervin . The Art of Computer Programming . . 1997 . 978-0-201-89683-1 . 3rd . 2 . Reading, Mass . 136 . en.
Book: Daskin, Mark S. . Bite-Sized Operations Management . Springer International Publishing . 2021 . 978-3-031-01365-2 . Synthesis Lectures on Operations Research and Applications . Cham . 127 . en . 10.1007/978-3-031-02493-1.
Book: Madhira, Sivaprasad . Introduction to Stochastic Processes Using R . Deshmukh . Shailaja . Springer Nature Singapore . 2023 . 978-981-99-5600-5 . Singapore . 449 . en . 10.1007/978-981-99-5601-2.
Polymenis . Athanase . 2021-10-01 . An application of the geometric distribution for assessing the risk of infection with SARS-CoV-2 by location . Asian Journal of Medical Sciences . 12 . 10 . 8–11 . 10.3126/ajms.v12i10.38783 . 2091-0576. free .

Geometric distribution explained

Definition

Properties

Memorylessness

Moments and cumulants

Proof of expected value

Summary statistics

General properties

Related distributions

Statistical inference

Method of moments

Maximum likelihood estimation

Bayesian inference

Random variate generation

Applications

See also

Notes and References