Poisson-Dirichlet distribution explained

In probability theory, Poisson-Dirichlet distributions are probability distributions on the set of nonnegative, non-increasing sequences with sum 1, depending on two parameters

\alpha\in[0,1)

and

\theta\in(-\alpha,infty)

. It can be defined as follows. One considers independent random variables

(Yn)n

such that

Yn

follows the beta distribution of parameters

1-\alpha

and

\theta+n\alpha

. Then, the Poisson-Dirichlet distribution

PD(\alpha,\theta)

of parameters

\alpha

and

\theta

is the law of the random decreasing sequence containing

Y1

and the products

Yn

n-1
\prod
k=1

(1-Yk)

. This definition is due to Jim Pitman and Marc Yor.[1] [2] It generalizes Kingman's law, which corresponds to the particular case

\alpha=0

.[3]

Number theory

Patrick Billingsley[4] has proven the following result: if

n

is a uniform random integer in

\{2,3,...,N\}

, if

k\geq1

is a fixed integer, and if

p1\geqp2\geq...\geqpk

are the

k

largest prime divisors of

n

(with

pj

arbitrarily defined if

n

has less than

j

prime factors), then the joint distribution of

(logp1/logn,logp2/logn,...,logpk/logn)

converges to the law of the

k

first elements of a

PD(0,1)

distributed random sequence, when

N

goes to infinity.

The Poisson-Dirichlet distribution of parameters

\alpha=0

and

\theta=1

is also the limiting distribution, for

N

going to infinity, of the sequence

(\ell1/N,\ell2/N,\ell3/N,...)

, where

\ellj

is the length of the

j\operatorname{th

} largest cycle of a uniformly distributed permutation of order

N

. If for

\theta>0

, one replaces the uniform distribution by the distribution

PN,

on

ak{S}N

such that

PN,(\sigma)=

\thetan(\sigma)
\theta(\theta+1)...(\theta+n-1)

, where

n(\sigma)

is the number of cycles of the permutation

\sigma

, then we get the Poisson-Dirichlet distribution of parameters

\alpha=0

and

\theta

. The probability distribution

PN,

is called Ewens's distribution,[5] and comes from the Ewens's sampling formula, first introduced by Warren Ewens in population genetics, in order to describe the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

Notes and References

  1. Jim . Pitman . Marc . Yor . The two-parameter Poisson - Dirichlet distribution derived from a stable subordinator . Annals of Probability . 25 . 2 . 855 - 900 . 1997 . 10.1214/aop/1024404422 . 1434129 . 0880.60076. 10.1.1.69.1273.
  2. Paul . Bourgade . Lois de Poisson - Dirichlet . Master thesis.
  3. J. F. C. . Kingman . Random discrete distributions . J. Roy. Statist. Soc. Ser. B . 37 . 1 - 22 . 1975.
  4. P. . Billingsley . On the distribution of large prime divisors . Periodica Mathematica . 2 . 283 - 289 . 1972.
  5. Warren . Ewens . The sampling theory of selectively neutral alleles . Theoretical Population Biology . 3 . 87 - 112 . 1972.