In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant.Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.
The restaurant analogy first appeared in a 1985 write-up by David Aldous,[1] where it was attributed to Jim Pitman (who additionally credits Lester Dubins).
An equivalent partition process was published a year earlier by Fred Hoppe,[2] using an "urn scheme" akin to Pólya's urn. In comparison with Hoppe's urn model, the Chinese restaurant process has the advantage that it naturally lends itself to describing random permutations via their cycle structure, in addition to describing random partitions.
For any positive integer
n
l{P}n
\{1,2,3,...,n\}\triangleq[n]
\prodnl{P}n
The value of the process at time
n
Bn
[n]
n=1
B1=\{\{1\}\}
n+1
n+1
Bn
|b|/(n+1)
|b|
Bn
1/(n+1)
\{1,...,n\}
[n-1]
n
Bn
Bn-1
The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is
\Pr(Bn=B)=
\prodb\in(|b|-1)! | |
n! |
, B\inl{P}n
where
b
B
|b|
b
The definition can be generalized by introducing a parameter
\theta>0
\theta | |
n+\theta |
|b|
|b| | |
n+\theta |
\theta=1
\theta
An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables.[3] Customer
n+1
n
1 | |
n+\theta |
\theta | |
n+\theta |
|b|
The Chinese restaurant table distribution (CRT) is the probability distribution on the number of tables in the Chinese restaurant process.[4] It can be understood as the sum of
n
\begin{align} K&=
n | |
\sum | |
i=1 |
bi\\[4pt] bi&\sim\operatorname{Bernoulli}\left(
\theta | |
i-1+\theta |
\right) \end{align}
The probability mass function of
K
f(k)=
\Gamma(\theta) | |
\Gamma(n+\theta) |
|s(n,k)|\thetak, k=1,...,n,
where
s
This construction can be generalized to a model with two parameters,
\theta
\alpha
n+1
|B|
\theta+|B|\alpha | |
n+\theta |
,
or at an occupied table
b
|b|
|b|-\alpha | |
n+\theta |
.
In order for the construction to define a valid probability measure it is necessary to suppose that either
\alpha<0
\theta=-L\alpha
L\in\{1,2,,...\}
0\leq\alpha<1
\theta>-\alpha
Under this model the probability assigned to any particular partition
B
[n]
\theta,\alpha
\Pr(Bn=B\mid\theta,\alpha)=
(\theta+\alpha)|B|-1, | |
(\theta+1)n-1, |
\prodb\in(1-\alpha)|b|-1,
where, the Pochhammer k-symbol is defined as follows: by convention,
(a)0,k=1
m>0
(a)m,k=
m-1 | |
\prod | |
i=0 |
(a+ik)=\begin{cases} am&ifk=0,
| ||||
\ \\ k |
)\overline&ifk>0,
| ||||
\ \\ \left|k\right| |
)\underline&ifk<0\end{cases}
where
x\overline
m-1 | |
=\prod | |
i=0 |
(x+i)
x\underline
m-1 | |
=\prod | |
i=0 |
(x-i)
\alpha<0
\theta=-L\alpha
(\theta+\alpha)|B|-1,=(|\alpha|(L-1))|B|-1,
|B|>L
L
For the case when
\theta>0
0<\alpha<1
\Pr(Bn=B\mid\theta,\alpha)=
\Gamma(\theta) | |
\Gamma(\theta+n) |
\dfrac{\alpha|B|\Gamma(\theta/\alpha+|B|)}{\Gamma(\theta/\alpha)}\prodb\in\dfrac{\Gamma(|b|-\alpha)}{\Gamma(1-\alpha)}.
In the one-parameter case, where
\alpha
\theta>0
\Pr(Bn=B\mid\theta)=
\Gamma(\theta)\theta|B| | |
\Gamma(\theta+n) |
\prodb\in\Gamma(|b|).
Or, when
\theta
0<\alpha<1
\Pr(Bn=B\mid\alpha)=
\alpha|B|-1\Gamma(|B|) | |
{\Gamma(n) |
As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.
If
\alpha=0
n
\theta
Here is one way to derive this partition probability. Let
Ci
i
i=1,2,3,...
\Pr(Ci=c\midC1,\ldots,Ci-1) =\begin{cases} \dfrac{\theta+|B|\alpha}{\theta+i-1}&ifc\innewblock,\ \\ \dfrac{|b|-\alpha}{\theta+i-1}&ifc\inb; \end{cases}
The probability that
Bn
\{1,...,n\}
i
1
n
b
b
|b|-1
b
b
b
b
b
\theta ⋅ 1 ⋅ 2 ⋅ 3
\Pr(Bn=B)
For the one parameter case, with
\alpha=0
0<\theta<infty
n
n | |
\begin{align} \sum | |
k=1 |
\theta | |
\theta+k-1 |
=\theta ⋅ (\Psi(\theta+n)-\Psi(\theta)) \end{align}
where
\Psi(\theta)
\alpha>0
\begin{align} | \Gamma(\theta+n+\alpha)\Gamma(\theta+1) |
\alpha\Gamma(\theta+n)\Gamma(\theta+\alpha) |
-
\theta | |
\alpha, \end{align} |
however, note that the
\Gamma( ⋅ )
For the parameter choice
\alpha<0
\theta=-L\alpha
L\in\{1,2,3,\ldots\}
L
L
\{1,2,\ldots,L\}
[n]=\{1,2,\ldots,n\}
p=(p1,p2,\ldots,pL)
\gamma=-\alpha>0
n
p
p
\elln+1
n
P(\elln+1=i\mid\ell1,\ldots,\elln)=
\gamma+\left|{bi | |
\right| |
}{L\gamma+n}
\left|{bi}\right|\ge0
i
\alpha=-\gamma
\theta=L\gamma
|bi|-\alpha | |
n+\theta |
|bi|\ge1
L-|B|
\sum | |
i:|bi|=0 |
P(\elln+1=i\mid\ell1,\ldots,\elln)=
(L-|B|)\gamma | |
n+L\gamma |
=
\theta+|B|\alpha | |
n+\theta |
The marginal probability for the labels is given by
P(\ell1,\ldots,\elln)=P(\ell1)\prod
n-1 | |
t=1 |
P(\ellt+1\mid\ell1,\ldots,\ellt)=
| ||||||||||||||
}}{(L\gamma) |
\overline
P(\ell | ||||
|
x\overline
m-1 | |
=\prod | |
i=0 |
(x+i)
B
\left|B\right|\leL
L\underline{\left|B\right|
\left|B\right|-1 | |
}=\prod | |
i=0 |
(L-i)
Pr(Bn=B\mid\gamma,L)= L\underline{\left|B\right|
which can be verified to agree with the general version of the partition probability that is given above in terms of the Pochhammer k-symbol. Notice again, that if
B
|B|>L
L\underline{|B|
logL\underline{|B|
-infty
|B|>L
Consider on the one hand, the one-parameter Chinese restaurant process, with
\alpha=0
\theta>0
CRP(\alpha=0,\theta)
L
\gamma= | \theta |
L |
CRP(\alpha=- | \theta |
L |
,\theta)
CRP(0,\theta)
L
The two-parameter Chinese restaurant process can equivalently be defined in terms of a stick-breaking process.[9] For the case where
0\le\alpha<1
\theta>-\alpha
p=(p1,p2,\ldots)
p1
p2,p3,\ldots
fk
k
fk\simB(1-\alpha,\theta+k\alpha), fork\ge1and0\le\alpha<1
The categorical probabilities are:
pk=fk\prod
k-1 | |
i=1 |
(1-fk), wheretheemptyproductevaluatestoone.
For the parameter settings
\alpha<0
\theta=-\alphaL
L
p=(p1,\ldots,pL)
p
fk\simB(-\alpha,\theta+k\alpha), for1\lek\leL-1and\alpha<0
fL=1
It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet processand can be used to infer latent features in data.[10]
The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,[11] biodiversity modelling, and image reconstruction [12] [13]