Disintegration theorem explained

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

Motivation

Consider the unit square

S=[0,1] x [0,1]

in the Euclidean plane

R2

. Consider the probability measure

\mu

defined on

S

by the restriction of two-dimensional Lebesgue measure

λ2

to

S

. That is, the probability of an event

E\subseteqS

is simply the area of

E

. We assume

E

is a measurable subset of

S

.

Consider a one-dimensional subset of

S

such as the line segment

Lx=\{x\} x [0,1]

.

Lx

has

\mu

-measure zero; every subset of

Lx

is a

\mu

-null set; since the Lebesgue measure space is a complete measure space,E \subseteq L_ \implies \mu (E) = 0.

While true, this is somewhat unsatisfying. It would be nice to say that

\mu

"restricted to"

Lx

is the one-dimensional Lebesgue measure

λ1

, rather than the zero measure. The probability of a "two-dimensional" event

E

could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices"

E\capLx

: more formally, if

\mux

denotes one-dimensional Lebesgue measure on

Lx

, then\mu (E) = \int_ \mu_ (E \cap L_) \, \mathrm xfor any "nice"

E\subseteqS

. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

Statement of the theorem

(Hereafter,

l{P}(X)

will denote the collection of Borel probability measures on a topological space

(X,T)

.)The assumptions of the theorem are as follows:

Y

and

X

be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).

\mu\inl{P}(Y)

.

\pi:Y\toX

be a Borel-measurable function. Here one should think of

\pi

as a function to "disintegrate"

Y

, in the sense of partitioning

Y

into

\{\pi-1(x)|x\inX\}

. For example, for the motivating example above, one can define

\pi((a,b))=a

,

(a,b)\in[0,1] x [0,1]

, which gives that

\pi-1(a)=a x [0,1]

, a slice we want to capture.

\nu\inl{P}(X)

be the pushforward measure

\nu=\pi*(\mu)=\mu\circ\pi-1

. This measure provides the distribution of

x

(which corresponds to the events

\pi-1(x)

).

The conclusion of the theorem: There exists a

\nu

-almost everywhere uniquely determined family of probability measures

\{\mux\}x\in\subseteql{P}(Y)

, which provides a "disintegration" of

\mu

into such that:

x\mapsto\mux

is Borel measurable, in the sense that

x\mapsto\mux(B)

is a Borel-measurable function for each Borel-measurable set

B\subseteqY

;

\mux

"lives on" the fiber

\pi-1(x)

: for

\nu

-almost all

x\inX

, \mu_ \left(Y \setminus \pi^ (x) \right) = 0, and so

\mux(E)

-1
=\mu
x(E\cap\pi

(x))

;

f:Y\to[0,infty]

, \int_ f(y) \, \mathrm \mu (y) = \int_ \int_ f(y) \, \mathrm \mu_x (y) \, \mathrm \nu (x). In particular, for any event

E\subseteqY

, taking

f

to be the indicator function of

E

,[1] \mu (E) = \int_X \mu_x (E) \, \mathrm \nu (x).

Applications

Product spaces

The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When

Y

is written as a Cartesian product

Y=X1 x X2

and

\pii:Y\toXi

is the natural projection, then each fibre
-1
\pi
1

(x1)

can be canonically identified with

X2

and there exists a Borel family of probability measures

\{

\mu
x1
\}
x1\inX1
in

l{P}(X2)

(which is

(\pi1)*(\mu)

-almost everywhere uniquely determined) such that\mu = \int_ \mu_ \, \mu \left(\pi_1^(\mathrm d x_1) \right)= \int_ \mu_ \, \mathrm (\pi_)_ (\mu) (x_),which is in particular\int_ f(x_1,x_2)\, \mu(\mathrm d x_1,\mathrm d x_2) = \int_\left(\int_ f(x_1,x_2) \mu(\mathrm d x_2\mid x_1) \right) \mu\left(\pi_1^(\mathrm x_)\right)and\mu(A \times B) = \int_A \mu\left(B\mid x_1\right) \, \mu\left(\pi_1^(\mathrm x_)\right).

The relation to conditional expectation is given by the identities\operatorname E(f\mid \pi_1)(x_1)= \int_ f(x_1,x_2) \mu(\mathrm d x_2\mid x_1),\mu(A\times B\mid \pi_1)(x_1)= 1_A(x_1) \cdot \mu(B\mid x_1).

Vector calculus

The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface, it is implicit that the "correct" measure on

\Sigma

is the disintegration of three-dimensional Lebesgue measure

λ3

on

\Sigma

, and that the disintegration of this measure on ∂Σ is the same as the disintegration of

λ3

on

\partial\Sigma

.[2]

Conditional distributions

The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.[3] The theorem is related to the Borel–Kolmogorov paradox, for example.

See also

Notes and References

  1. Book: Dellacherie, C. . Meyer, P.-A. . Probabilities and Potential . North-Holland Mathematics Studies . North-Holland . Amsterdam . 1978 . 0-7204-0701-X .
  2. Book: Ambrosio, L. . Gigli, N. . Savaré, G. . Gradient Flows in Metric Spaces and in the Space of Probability Measures . ETH Zürich, Birkhäuser Verlag, Basel . 2005 . 978-3-7643-2428-5 .
  3. Chang . J.T. . Pollard, D. . Conditioning as disintegration . Statistica Neerlandica . 1997 . 51 . 3 . 10.1111/1467-9574.00056 . 287 . 10.1.1.55.7544 . 16749932 .