In computational linguistics, second-order co-occurrence pointwise mutual information is a semantic similarity measure. To assess the degree of association between two given words, it uses pointwise mutual information (PMI) to sort lists of important neighbor words of the two target words from a large corpus.
The PMI-IR method used AltaVista's Advanced Search query syntax to calculate probabilities. Note that the "NEAR" search operator of AltaVista is an essential operator in the PMI-IR method. However, it is no longer in use in AltaVista; this means that, from the implementation point of view, it is not possible to use the PMI-IR method in the same form in new systems. In any case, from the algorithmic point of view, the advantage of using SOC-PMI is that it can calculate the similarity between two words that do not co-occur frequently, because they co-occur with the same neighboring words. For example, the British National Corpus (BNC) has been used as a source of frequencies and contexts.
The method considers the words that are common in both lists and aggregate their PMI values (from the opposite list) to calculate the relative semantic similarity. We define the pointwise mutual information function for only those words having
fb(ti,w)>0
pmi(t | |
f | |
i,w)=log |
2
fb(ti,w) x m | ||||||||||||
|
,
where
ft(ti)
ti
b(t | |
f | |
i, |
w)
ti
w
m
w
Xw
w
\beta
pmi(t | |
f | |
i, |
w)>0
The set
Xw
w | |
X | |
i |
w\} | |
X | |
i |
i=1,2,\ldots,\beta
w, | |
f | |
1 |
w)\geq
w, | |
f | |
2 |
w)\geq …
w, | |
f | |
\beta-1 |
w)\geq
w, | |
f | |
\beta |
w)
A rule of thumb is used to choose the value of
\beta
\beta
w1
w2
f(w1,w2,\beta)=\sum
\beta | |
i=1 |
w1 | |
(f | |
i |
\gamma | |
,w | |
2)) |
where
w1 | |
f | |
i |
,w2)>0
w2 | |
X |
w1 | |
X |
w2
w1
\gamma
\beta
w1
w2
\beta=\beta1
\beta
w2
w1
\beta=\beta2
f(w1,w2,\beta1)=\sum
\beta1 | |
i=1 |
w1 | |
(f | |
i |
\gamma | |
,w | |
2)) |
and
f(w2,w1,\beta2)=\sum
\beta2 | |
i=1 |
w2 | |
(f | |
i |
\gamma | |
,w | |
1)) |
respectively.
Finally, the semantic PMI similarity function between the two words,
w1
w2
Sim(w1,w
+ | |||||
|
f(w2,w1,\beta2) | |
\beta2 |
.
The semantic word similarity is normalized, so that it provides a similarity score between
0
1
ri
sj
λ
λ=20