In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.[1] [2]
X
Y
A
X=\{X1,X2,\ldots,Xk\}
Y=\{Y1,Y2,\ldots,Yl\}
Let:
n=\sumi|Xi|=\sumj|Yj|=|A|
pi=|Xi|/n
qj=|Yj|/n
rij=|Xi\capYj|/n
Then the variation of information between the two partitions is:
VI(X;Y)=-\sumi,jrij\left[log(rij/pi)+log(rij/qj)\right]
This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on
A
\mu(B):=|B|/n
B\subseteqA
We can rewrite this definition in terms that explicitly highlight the information content of this metric.
The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet
\wedge
\vee
\overline{1
\overline{0
X
Y
Xi
X
Yi
Y
X\wedgeY\subseteqX
X\wedgeY\subseteqY
Let's define the entropy of a partition
X
H\left(X\right)=-\sumipilogpi
where
pi=|Xi|/n
H(\overline{1
H(\overline{0
X\subseteqY ⇒ H(X)\geqH(Y)
X
Y
VI(X,Y)=2H(X\wedgeY)-H(X)-H(Y)
The difference
d(X,Y)\equiv|H\left(X\right)-H\left(Y\right)|
d(X,Y)=0
X=Y
\overline{1
VI(X,1)=H\left(X\right)
If in the Hasse diagram we draw an edge from every partition to the maximum
\overline{1
\overline{1
VI(X,Y)=|VI(X,\overline{1
For
H(X)
H(X,Y)=H(X\wedgeY)
and we also have that
d(X,X\wedgeY)=H(X\wedgeY|X)
X\wedgeY
X
The variation of information satisfies
VI(X;Y)=H(X)+H(Y)-2I(X,Y)
where
H(X)
X
I(X,Y)
X
Y
A
VI(X;Y)=H(X,Y)-I(X,Y)
where
H(X,Y)
X
Y
VI(X;Y)=H(X|Y)+H(Y|X)
where
H(X|Y)
H(Y|X)
The variation of information can also be bounded, either in terms of the number of elements:
VI(X;Y)\leqlog(n)
Or with respect to a maximum number of clusters,
K*
VI(X;Y)\leq2log(K*)
To verify the triangle inequality
VI(X;Z)\leqVI(X;Y)+VI(Y;Z)
VI(X;Y)=H(X|Y)+H(Y|X)