Giry monad explained

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra. It is one of the main examples of a probability monad.

It is implicitly used in probability theory whenever one considers probability measures which depend measurably on a parameter (giving rise to Markov kernels), or when one has probability measures over probability measures (such as in de Finetti's theorem).

Like many iterable constructions, it has the category-theoretic structure of a monad, on the category of measurable spaces.

Construction

The Giry monad, like every monad, consists of three structures:

A functorial assignment, which in this case assigns to a measurable space

a space of probability measures

over it;

\delta:X\toPX

called the unit, which in this case assigns to each element of a space the Dirac measure over it;

l{E}:PPX\toPX

called the multiplication, which in this case assigns to each probability measure over probability measures its expected value.

The space of probability measures

Let

(X,l{F})

be a measurable space. Denote by

the set of probability measures over

(X,l{F})

. We equip the set

with a sigma-algebra as follows. First of all, for every measurable set

A\inl{F}

, define the map

\varepsilon_A:PX\toR

p\longmapstop(A)

. We then define the sigma algebra

l{PF}

to be the smallest sigma-algebra which makes the maps

\varepsilon_A

measurable, for all

A\inl{F}

(where

is assumed equipped with the Borel sigma-algebra).

Equivalently,

l{PF}

can be defined as the smallest sigma-algebra on

which makes the maps

p\longmapsto\int_Xfdp

measurable for all bounded measurable

f:X\toR

The assignment

(X,l{F})\mapsto(PX,l{PF})

is part of an endofunctor on the category of measurable spaces, usually denoted again by

. Its action on morphisms, i.e. on measurable maps, is via the pushforward of measures. Namely, given a measurable map

f:(X,l{F})\to(Y,l{G})

, one assigns to

the map

f_{*:(PX,l{PF})\to(PY,l{PG})}

defined by

	-1
f
	p(B)=p(f*

(B))

for all

p\inPX

and all measurable sets

B\inl{G}

The Dirac delta map

Given a measurable space

(X,l{F})

, the map

\delta:(X,l{F})\to(PX,l{PF})

maps an element

x\inX

to the Dirac measure

\delta_x\inPX

, defined on measurable subsets

A\inl{F}

\delta_x(A)=1_A(x)= \begin{cases} 1&ifx\inA,\\ 0&ifx\notinA. \end{cases}

The expectation map

Let

\mu\inPPX

, i.e. a probability measure over the probability measures over

(X,l{F})

. We define the probability measure

l{E}\mu\inPX

l{E}\mu(A)=\int_PXp(A)\mu(dp)

for all measurable

A\inl{F}

.This gives a measurable, natural map

l{E}:(PPX,l{PPF})\to(PX,l{PF})

Example: mixture distributions

A mixture distribution, or more generally a compound distribution, can be seen as an application of the map

l{E}

. Let's see this for the case of a finite mixture. Let

p_1,...,p_n

be probability measures on

(X,l{F})

, and consider the probability measure

given by the mixture

q(A)=

	n
\sum
	i=1

w_ip_i(A)

for all measurable

A\inl{F}

, for some weights

w_i\ge0

satisfying

w_1+...+w_n=1

. We can view the mixture

as the average

q=l{E}\mu

, where the measure on measures

\mu\inPPX

, which in this case is discrete, is given by

\mu=

	n
\sum
	i=1

w_i\delta


	p_i

More generally, the map

l{E}:PPX\toPX

can be seen as the most general, non-parametric way to form arbitrary mixture or compound distributions.

The triple

(P,\delta,l{E})

is called the Giry monad.

Relationship with Markov kernels

l{PF}

is that given measurable spaces

(X,l{F})

and

(Y,l{G})

, we have a bijective correspondence between measurable functions

(X,l{F})\to(PY,l{PG})

and Markov kernels

(X,l{F})\to(Y,l{G})

. This allows to view a Markov kernel, equivalently, as a measurably parametrized probability measure.

In more detail, given a measurable function

f:(X,l{F})\to(PY,l{PG})

, one can obtain the Markov kernel

f^{\flat:(X,l{F})\to(Y,l{G})}

as follows,

f^\flat(B|x)=f(x)(B)

for every

x\inX

and every measurable

B\inl{G}

(note that

f(x)\inPY

is a probability measure). Conversely, given a Markov kernel

k:(X,l{F})\to(Y,l{G})

, one can form the measurable function

k^{\sharp:(X,l{F})\to(PY,l{PG})}

mapping

x\inX

to the probability measure

k^\sharp(x)\inPY

defined by

k^\sharp(x)(B)=k(B|x)

for every measurable

B\inl{G}

. The two assignments are mutually inverse.

Hom_Meas(X,PY)\congHom_Stoch(X,Y)

between the category of measurable spaces and the category of Markov kernels. In particular, the category of Markov kernels can be seen as the Kleisli category of the Giry monad.

Product distributions

Given measurable spaces

(PX,l{PF}) x (PY,l{PG})\to(P(X x Y),l{P(F x G)})

usually denoted by

\nabla

or by

⊗

The map

\nabla:PX x PY\toP(X x Y)

is in general not an isomorphism, since there are probability measures on

X x Y

which are not product distributions, for example in case of correlation.However, the maps

\nabla:PX x PY\toP(X x Y)

and the isomorphism

1\congP1

make the Giry monad a monoidal monad, and so in particular a commutative strong monad.

Further properties

If a measurable space

(X,l{F})

is standard Borel, so is

(PX,l{PF})

. Therefore the Giry monad restricts to the full subcategory of standard Borel spaces.

The algebras for the Giry monad include compact convex subsets of Euclidean spaces, as well as the extended positive real line

[0,infty]

, with the algebra structure map given by taking expected values. For example, for

[0,infty]

, the structure map

e:P[0,infty]\to[0,infty]

is given by

p\longmapsto\int_[0,infty)xp(dx)

whenever

is supported on

[0,infty)

and has finite expected value, and

e(p)=infty

otherwise.

References

Book: Giry, Michèle . Categorical Aspects of Topology and Analysis . A categorical approach to probability theory . Lecture Notes in Mathematics . 1982 . 915 . 68–85 . Springer. 10.1007/BFb0092872 . 978-3-540-11211-2 . https://link.springer.com/chapter/10.1007/BFb0092872.
Doberkat . Ernst-Erich. Eilenberg-Moore algebras for stochastic relations. Information and Computation. 204. 12. 2006. 1756–1781. 10.1016/j.ic.2006.09.001.
Avery . Tom. Codensity and the Giry monad. Journal of Pure and Applied Algebra. 220. 3. 1229–1251. 2016. 10.1016/j.jpaa.2015.08.017. 1410.4432. 119695729.
Jacobs . Bart. From probability monads to commutative effectuses. Journal of Logical and Algebraic Methods in Programming. 94. 200–237. 2018. 10.1016/j.jlamp.2016.11.006.
Fritz . Tobias. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics. 370. 2020. 10.1016/j.aim.2020.107239. 1908.07021. 201103837.
Moss . Sean. Perrone . Paolo. Probability monads with submonads of deterministic states. LICS '22: Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science. 2022. 10.1145/3531130.3533355. 2204.07003.
Book: Riehl, Emily . Category Theory in Context. Chapter 5. Monads and their Algebras. 2016 . Dover. 978-0486809038 .
Book: Perrone, Paolo . Starting Category Theory. Chapter 5. Monads and Comonads. 2024 . World Scientific. 10.1142/9789811286018_0005 . 978-981-12-8600-1. https://www.worldscientific.com/doi/10.1142/9789811286018_0005.

External links

What is a probability monad?, video tutorial.

Giry monad explained

Construction

The space of probability measures

The Dirac delta map

The expectation map

Example: mixture distributions

Relationship with Markov kernels

Product distributions

Further properties

See also

References

Further reading

External links