In probability theory and information theory, adjusted mutual information, a variation of mutual information may be used for comparing clusterings.[1] It corrects the effect of agreement solely due to chance between clusterings, similar to the way the adjusted rand index corrects the Rand index. It is closely related to variation of information:[2] when a similar adjustment is made to the VI index, it becomes equivalent to the AMI. The adjusted measure however is no longer metrical.
Given a set S of N elements
S=\{s1,s2,\ldotssN\}
U=\{U1,U2,\ldots,UR\}
V=\{V1,V2,\ldots,VC\}
Ui\capUj=\varnothing=Vi\capVj
i\nej
RU | |
\cup | |
i=\cup |
C | |
j=1 |
Vj=S
M=[nij
i=1\ldotsR | |
] | |
j=1\ldotsC |
nij
Ui
Vj
nij=\left|Ui\capVj\right|
Suppose an object is picked at random from S; the probability that the object falls into cluster
Ui
P | ||||
|
R | |
H(U)=-\sum | |
i=1 |
PU(i)logPU(i)
C | |
H(V)=-\sum | |
j=1 |
PV(j)logPV(j)
PV(j)={|Vj|}/{N}
R | |
MI(U,V)=\sum | |
i=1 |
C | |
\sum | |
j=1 |
PUV(i,j)log
PUV(i,j) | |
PU(i)PV(j) |
PUV(i,j)
Ui
Vj
PUV(i,j)=
|Ui\capVj| | |
N |
Like the Rand index, the baseline value of mutual information between two random clusterings does not take on a constant value, and tends to be larger when the two partitions have a larger number of clusters (with a fixed number of set elements N).By adopting a hypergeometric model of randomness, it can be shown that the expected mutual information between two random clusterings is:
\begin{align}E\{MI(U,V)\}=
R | |
& \sum | |
i=1 |
C | |
\sum | |
j=1 |
min(ai,bj) | |||||||||||||
\sum | |||||||||||||
|
nij | |
N |
log\left(
N ⋅ nij | |
aibj |
\right) x \\ &
ai!bj!(N-ai)!(N-bj)! | |
N!nij!(ai-nij)!(bj-nij)!(N-ai-bj+nij)! |
\\ \end{align}
(ai+b
+ | |
j-N) |
max(0,ai+bj-N)
ai
bj
ai=\sum
Cn | |
ij |
bj=\sum
Rn | |
ij |
The adjusted measure[1] for the mutual information may then be defined to be:
AMI(U,V)=
MI(U,V)-E\{MI(U,V)\ | |
The AMI takes a value of 1 when the two partitions are identical and 0 when the MI between two partitions equals the value expected due to chance alone.