Variation of information explained
In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.[1] [2] [3]
Definition
and
of a
set
into disjoint
subsets, namely
and
.
Let:
n=\sumi|Xi|=\sumj|Yj|=|A|
and
Then the variation of information between the two partitions is:
VI(X;Y)=-\sumi,jrij\left[log(rij/pi)+log(rij/qj)\right]
.
This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on
defined by
for
.
Explicit information content
We can rewrite this definition in terms that explicitly highlight the information content of this metric.
The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet
and the join
, where the maximum
} is the partition with only one block, i.e., all elements grouped together, and the minimum is
}, the partition consisting of all elements as singletons. The meet of two partitions
and
is easy to understand as that partition formed by all pair intersections of one block of,
, of
and one,
, of
. It then follows that
and
.
Let's define the entropy of a partition
as
H\left(X\right)=-\sumipilogpi
,
where
. Clearly,
})=0 and
})=\log\,n. The entropy of a partition is a monotonous function on the lattice of partitions in the sense that
X\subseteqY ⇒ H(X)\geqH(Y)
. Then the VI distance between
and
is given by
VI(X,Y)=2H(X\wedgeY)-H(X)-H(Y)
.
The difference
d(X,Y)\equiv|H\left(X\right)-H\left(Y\right)|
is a pseudo-metric as
doesn't necessarily imply that
. From the definition of
}, it is
.
If in the Hasse diagram we draw an edge from every partition to the maximum
} and assign it a weight equal to the VI distance between the given partition and
}, we can interpret the VI distance as basically an average of differences of edge weights to the maximum
VI(X,Y)=|VI(X,\overline{1
})\,-\,\mathrm(X\wedge Y,\overline)|\,+\,|\mathrm(Y,\overline)\,-\,\mathrm(X\wedge Y,\overline)| \,=\,d(X,X\wedge Y)\,+\,d(Y,X\wedge Y).
For
as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet
and we also have that
d(X,X\wedgeY)=H(X\wedgeY|X)
coincides with the conditional entropy of the meet (intersection)
relative to
.
Identities
The variation of information satisfies
VI(X;Y)=H(X)+H(Y)-2I(X,Y)
,
where
is the
entropy of
, and
is
mutual information between
and
with respect to the uniform probability measure on
. This can be rewritten as
,
where
is the
joint entropy of
and
, or
,
where
and
are the respective
conditional entropies.
The variation of information can also be bounded, either in terms of the number of elements:
,
Or with respect to a maximum number of clusters,
:
Triangle inequality
To verify the triangle inequality
VI(X;Z)\leqVI(X;Y)+VI(Y;Z)
, expand using the identity
. It suffices to prove
The right side has a lower bound
which is no less than the left side.
Further reading
- Arabie. P.. Boorman, S. A. . Multidimensional scaling of measures of distance between partitions. Journal of Mathematical Psychology. 1973. 10. 2. 148–203. 10.1016/0022-2496(73)90012-6.
- Book: Meila, Marina . Comparing Clusterings by the Variation of Information . Learning Theory and Kernel Machines. 2777 . 2003. 173–187. 10.1007/978-3-540-45167-9_14. Lecture Notes in Computer Science . 978-3-540-40720-1 . 4341039 .
- Meila . M. . Comparing clusterings—an information based distance . 10.1016/j.jmva.2006.11.013 . Journal of Multivariate Analysis . 98 . 5 . 873–895 . 2007 . free .
- Web site: Kingsford . Carl . Information Theory Notes . 2009 . 22 September 2009.
- Kraskov . Alexander . Harald Stögbauer . Ralph G. Andrzejak . Peter Grassberger . Peter Grassberger . Hierarchical Clustering Based on Mutual Information . 2003 . q-bio/0311039 .
External links
Notes and References
- P. Arabie, S.A. Boorman, S. A., "Multidimensional scaling of measures of distance between partitions", Journal of Mathematical Psychology (1973), vol. 10, 2, pp. 148–203, doi: 10.1016/0022-2496(73)90012-6
- W.H. Zurek, Nature, vol 341, p119 (1989); W.H. Zurek, Physics Review A, vol 40, p. 4731 (1989)
- Marina Meila, "Comparing Clusterings by the Variation of Information", Learning Theory and Kernel Machines (2003), vol. 2777, pp. 173–187,, Lecture Notes in Computer Science,