The Kendall tau rank distance is a metric (distance function) that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would take to place one list in the same order as the other list. The Kendall tau distance was created by Maurice Kendall.
The Kendall tau ranking distance between two lists
\tau1
\tau2
\tau1(i)
\tau2(i)
i
\tau1
\tau2
Kd(\tau1,\tau2)
n
Kendall tau distance may also be defined as
where
\tau1
\tau2
\bar{K}i,j(\tau1,\tau2)
\tau1
\tau2
\bar{K}i,j(\tau1,\tau2)
\tau1
\tau2.
Kendall tau distance can also be defined as the total number of discordant pairs.
Kendall tau distance in Rankings: A permutation (or ranking) is an array of N integers where each of the integers between 0 and N-1 appears exactly once.
The Kendall tau distance between two rankings is the number of pairs that are in different order in the two rankings. For example, the Kendall tau distance between 0 3 1 6 2 5 4 and 1 0 3 6 4 2 5 is four because the pairs 0-1, 3-1, 2-4, 5-4 are in different order in the two rankings, but all other pairs are in the same order.[1]
The normalized Kendall tau distance
Kn
Kd | ||||
|
=
2Kd | |
n(n-1) |
If Kendall tau distance function is performed as
K(L1,L2)
K(\tau1,\tau2)
\tau1
\tau2
L1
L2
Generalised versions of Kendall tau distance have been proposed to give weights to different items and different positions in the ranking.[2]
The Kendall tau distance (
Kd
Kc
They are related by
Kc=1-4Kd/(n(n-1))
Kd=(1-Kc)(n(n-1))/4
Or simpler by
Kc=1-2Kn,Kn=(1-Kc)/2
Kn
2Kd/(n(n-1))
The distance is a value between 0 and
n(n-1)/2
The correlation is between -1 and 1.
The distance between equals is 0, the correlation between equals is 1.
The distance between reversals is
n(n-1)/2
For example comparing the rankings A>B>C>D and A>B>C>D the distance is 0 the correlation is 1.
Comparing the rankings A>B>C>D and D>C>B>A the distance is 6 the correlation is -1
Comparing the rankings A>B>C>D and B>D>A>C the distance is 3 the correlation is 0
Suppose one ranks a group of five people by height and by weight:
Person | A | B | C | D | E | ranking | |
---|---|---|---|---|---|---|---|
Rank by height | 1 | 2 | 3 | 4 | 5 | A>B>C>D>E | |
Rank by weight | 3 | 4 | 1 | 2 | 5 | C>D>A>B>E |
Here person A is tallest and third-heaviest, B is the second -tallest and fourth-heaviest and so on.
In order to calculate the Kendall tau distance between these two rankings, pair each person with every other person and count the number of times the values in list 1 are in the opposite order of the values in list 2.
Pair | Height | Weight | Count | |
---|---|---|---|---|
(A,B) | 1 < 2 | 3 < 4 | ||
(A,C) | 1 < 3 | 3 > 1 | X | |
(A,D) | 1 < 4 | 3 > 2 | X | |
(A,E) | 1 < 5 | 3 < 5 | ||
(B,C) | 2 < 3 | 4 > 1 | X | |
(B,D) | 2 < 4 | 4 > 2 | X | |
(B,E) | 2 < 5 | 4 < 5 | ||
(C,D) | 3 < 4 | 1 < 2 | ||
(C,E) | 3 < 5 | 1 < 5 | ||
(D,E) | 4 < 5 | 2 < 5 |
Since there are four pairs whose values are in opposite order, the Kendall tau distance is 4. The normalized Kendall tau distance is
4 | |
5(5-1)/2 |
=0.4.
A value of 0.4 indicates that 40% of pairs differ in ordering between the two lists.
A naive implementation in Python (using NumPy) is:
def normalised_kendall_tau_distance(values1, values2): """Compute the Kendall tau distance.""" n = len(values1) assert len(values2)
However, this requires
n2
Given two rankings
\tau1,\tau2
\tau1=(1,2,3,...)
\tau2
i,j
i<j
\tau2(i)>\tau2(j)
O(nlogn)
O(n\sqrt{log{n}})
Here is a basic C implementation.
int kendallTau(short x[], short y[], int len)
float normalize(int kt, int len)