Boltzmann sampler explained
A Boltzmann sampler is an algorithm intended for random sampling of combinatorial structures. If the object size is viewed as its energy, and the argument of the corresponding generating function is interpreted in terms of the temperature of the physical system, then a Boltzmann sampler returns an object from a classical Boltzmann distribution.
The concept of Boltzmann sampler was proposed by Philippe Duchon, Philippe Flajolet, Guy Louchard and Gilles Schaeffer in 2004.[1]
Description
The concept of Boltzmann sampling is closely related to the symbolic method in combinatorics.Let
be a
combinatorial class with an ordinary
generating function
which has a nonzero radius of convergence
, i.e. is
complex analytic. Formally speaking, if each object
is equipped with a non-negative integer
size
, then the generating function
is defined as
C(z)=\sumcz\omega(c)=
anzn,
where
denotes the number of objects
of size
. The size function is typically used to denote the number of vertices in a tree or in a graph, the number of letters in a word, etc.
A Boltzmann sampler for the class
with a parameter
such that
, denoted as
returns an object
with probability
P(\GammalC(z)=c)=\dfrac{z\omega(c)
}.
Construction
Finite sets
If
is finite, then an element
is drawn with probability proportional to
.
Disjoint union
If the target class is a disjoint union of two other classes,
, and the generating functions
and
of
and
are known, then the Boltzmann sampler for
can be obtained as
\left(\operatorname{Bern}\left(
\right)\longrightarrow\GammalA(z)\mid\GammalB(z)\right)
where
stands for "if the random variable
is 1, then execute
, else execute
". More generally, if the disjoint union is taken over a finite set, the resulting Boltzmann sampler can be represented using a random choice with probabilities proportional to the values of the generating functions.
Cartesian product
If
is a class constructed of ordered pairs
where
and
, then the corresponding Boltzmann sampler
can be obtained as
\GammalC(z)=(\GammalA(z),\GammalB(z)),
i.e. by forming a pair
with
and
drawn independently from
and
.
Sequence
If
is composed of all the finite sequences of elements of
with size of a sequence additively inherited from sizes of components, then the generating function of
is expressed as
C(z)=
A(z)k=\dfrac{1}{1-A(z)}
, where
is the generating function of
. Alternatively, the class
admits a recursive representation
This gives two possibilities for
.
\GammalC(z)=\Gamma(1+lA x lC)(z)
=\left(\operatorname{Bern}\left(
\right)\longrightarrow1|(\GammalA(z),\GammalC(z))\right)
\GammalC(z)=\left(\operatorname{Geom}(A(z))\Longrightarrow\GammalC(z)
\right)
where
stands for "draw a random variable
; if the value
is returned, then execute
independently
times and return the sequence obtained". Here,
stands for the geometric distribution
P(\operatorname{Geom}(p)=k)=pk(1-p)
.
Recursive classes
As the first construction of the sequence operator suggests, Boltzmann samplers can be used recursively. If the target class
is a part of the system
\begin{cases}
lC1=\Phi1(lC1,\ldots,lCn,lZ),\\
\vdots\\
lCn=\Phin(lC1,\ldots,lCn,lZ),
\end{cases}
where each of the expressions
involves only disjoint union, cartesian product and sequence operator, then the corresponding Boltzmann sampler is well defined. Given the argument value
, the numerical values of the generating functions can be obtained by Newton iteration.
[2] Labelled structures
Boltzmann sampling can be applied to labelled structures. For a labelled combinatorial class
, exponential generating function is used instead:
C(z)=\sumc\dfrac{z\omega(c)
} = \sum_^\infty a_n \dfrac,
where
denotes the number of labelled objects
of size
. The operation of cartesian product and sequence need to be adjusted to take labelling into account, and the principle of construction remains the same.
In the labelled case, the Boltzmann sampler for a labelled class
is required to output an object
with probability
Labelled sets
In the labelled universe, a class
can be composed of all the finite sets of elements of a class
with order-consistent relabellings. In this case, the exponential generating function of the class
is written as
C(z)=
\dfrac{A(z)k}{k!}=eA(z)
where
is the exponential generating function of the class
. The Boltzmann sampler for
can be described as
\GammalC(z)=\left(\operatorname{Poisson}(A(z))\Longrightarrow\GammalC(z)
\right)
where
\operatorname{Poisson}(λ)
stands for the standard
Poisson distribution P(\operatorname{Poisson}(λ)=k)=e-λ\dfrac{λk}{k!}
.
Labelled cycles
In the cycle construction, a class
is composed of all the finite sequences of elements of a class
, where two sequences are considered equivalent if they can be obtained by a cyclic shift. The exponential generating function of the class
is written as
where
is the exponential generating function of the class
. The Boltzmann sampler for
can be described as
\GammalC(z)=\left(\operatorname{Loga}(A(z))\Longrightarrow\GammalC(z)
\right)
where
describes the
log-law distribution P(\operatorname{Loga}(λ)=k)=\dfrac{1}{log(1-λ)-1
} \dfrac.
Properties
Let
denote the random size of the generated object from
. Then, the size has the first and the second moment satisfying
Ez(N)=z\dfrac{C'(z)}{C(z)};
Ez(N2)=\dfrac{z2C''(z)+zC'(z)}{C(z)};
z\dfrac{d}{dz}Ez(N)=\operatorname{Var}z(N)
.
Examples
Binary trees
The class
of
binary trees can be defined by the recursive specification
and its generating function
satisfies an equation
and can be evaluated as a solution of the quadratic equation
B(z)=\dfrac{1-\sqrt{1-4z2}}{2z}
The resulting Boltzmann sampler can be described recursively by
\GammalB(z)=\left(\operatorname{Bern}\left(
\right)\longrightarrow
lZ\mid\left(lZ,\GammalB(z),\GammalB(z)\right)\right)
Set partitions
Consider various partitions of the set
into several non-empty classes, being disordered between themselves.Using symbolic method, the class
of set partitions can be expressed as
lC=\operatorname{Set}(\operatorname{Set}>0(lZ)).
The corresponding generating function is equal to
. Therefore, Boltzmann sampler can be described as
\GammalC=\left(\operatorname{Poisson}(ez-1)\Longrightarrow\left(\operatorname{Poisson}>0(z)\LongrightarrowlZ\right)\right),
where the positive Poisson distribution
\operatorname{Poisson}>0(λ)
is a Poisson distribution with a parameter
conditioned to take only positive values.
Further generalisations
The original Boltzmann samplers described by Philippe Duchon, Philippe Flajolet, Guy Louchard and Gilles Schaeffer only support basic unlabelled operations of disjoint union, cartesian product and sequence, and two additional operations for labelled classes, namely the set and the cycle construction. Since then, the scope of combinatorial classes for which a Boltzmann sampler can be constructed, has expanded.
Unlabelled structures
The admissible operations for unlabelled classes include such additional operations as Multiset, Cycle and Powerset. Boltzmann samplers for these operations have been described by Philippe Flajolet, Éric Fusy and Carine Pivoteau.[3]
Differential specifications
Let
be a labelled combinatorial class. The derivative operation is defined as follows: take a labelled object
and replace an atom with the largest label with a distinguished atom without a label, therefore reducing a size of the resulting object by 1. If
is the exponential generating function of the class
, then the exponential generating function of the derivative class
is given by
A differential specification is a recursive specification of type
where the expression
involves only standard operations of union, product, sequence, cycle and set, and does not involve differentiation.
Boltzmann samplers for differential specifications have been constructed by Olivier Bodini, Olivier Roussel and Michèle Soria.[4]
Multi-parametric Boltzmann samplers
A multi-parametric Boltzmann distribution for multiparametric combinatorial classes is defined similarly to the classical case. Assume that each object
is equipped with the composition size
\omega(c)=(\omega1(c),\ldots,\omegad(c))
which is a vector of non-negative integer numbers. Each of the size functions
can reflect one of the parameters of a
data structure, such as the number of leaves of certain colour in a tree, the height of the tree, etc. The corresponding
multivariate generating function
is then associated with a multi-parametric class, and is defined as
A
Boltzmann sampler for the multiparametric class
with a vector parameter
\boldsymbolz=(z1,\ldots,zd)
inside the domain of
analyticity of
, denoted as
returns an object
with probability
P(\GammalC(z1,\ldots,zd)=c)=
.
Multiparametric Boltzmann samplers have been constructed by Olivier Bodini and Yann Ponty.[5] A polynomial-time algorithm for finding the numerical values of the parameters
given the target parameter expectations, can be obtained by formulating an auxiliary convex optimisation problem
Applications
Boltzmann sampling can be used to generate algebraic data types for the sake of property-based testing.[6]
Software
- Random Discrete Objects Suite (RDOS): http://lipn.fr/rdos/
- Combstruct package in Maple: https://www.maplesoft.com/support/help/Maple/view.aspx?path=combstruct
- Haskell package Boltzmann Brain: https://github.com/maciej-bendkowski/boltzmann-brain
Notes and References
- Duchon. Philippe. Flajolet. Philippe. Louchard. Guy. Schaeffer. Gilles. July 2004. Boltzmann Samplers for the Random Generation of Combinatorial Structures. Combinatorics, Probability and Computing. en. 13. 4–5. 577–625. 10.1017/S0963548304006315. 1634696 . 0963-5483.
- Pivoteau. Carine. Salvy. Bruno. Soria. Michèle. November 2012. Algorithms for combinatorial structures: Well-founded systems and Newton iterations. Journal of Combinatorial Theory, Series A. 119. 8. 1711–1773. 10.1016/j.jcta.2012.05.007. 0097-3165. free. 1109.2688.
- Flajolet. Philippe. Fusy. Éric. Pivoteau. Carine. 2007-01-06. Boltzmann Sampling of Unlabelled Structures. 2007 Proceedings of the Fourth Workshop on Analytic Algorithmics and Combinatorics (ANALCO). 201–211. Philadelphia, PA. Society for Industrial and Applied Mathematics. 10.1137/1.9781611972979.5. 978-1-61197-297-9. free.
- Bodini. Olivier. Roussel. Olivier. Soria. Michèle. December 2012. Boltzmann samplers for first-order differential specifications. Discrete Applied Mathematics. 160. 18. 2563–2572. 10.1016/j.dam.2012.05.022. 0166-218X. free.
- Book: Bodini, Olivier Ponty, Yann. Multi-dimensional Boltzmann Sampling of Languages. 695180521.
- Book: Lampropoulos . Leonidas . Gallois-Wong . Diane . Hriţcu . Cătălin . Hughes . John . Pierce . Benjamin C. . Xia . Li-yao . Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages . Beginner's luck: A language for property-based generators . 2017-01-01 . https://doi.org/10.1145/3009837.3009868 . POPL '17 . New York, NY, USA . Association for Computing Machinery . 114–129 . 10.1145/3009837.3009868 . 1607.05443 . 978-1-4503-4660-3. 14378582 .