Kendall's W (also known as Kendall's coefficient of concordance) is a non-parametric statistic for rank correlation. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters and in particular inter-rater reliability. Kendall's W ranges from 0 (no agreement) to 1 (complete agreement).
Suppose, for instance, that a number of people have been asked to rank a list of political concerns, from the most important to the least important. Kendall's W can be calculated from these data. If the test statistic W is 1, then all the survey respondents have been unanimous, and each respondent has assigned the same order to the list of concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various responses.
While tests using the standard Pearson correlation coefficient assume normally distributed values and compare two sequences of outcomes simultaneously, Kendall's W makes no assumptions regarding the nature of the probability distribution and can handle any number of distinct outcomes.
Suppose that object i is given the rank ri,j by judge number j, where there are in total n objects and m judges. Then the total rank given to object i is
Ri=\sum
m | |
j=1 |
ri,j,
\barR=
1 | |
n |
n | |
\sum | |
i=1 |
Ri.
n | |
S=\sum | |
i=1 |
(Ri-\barR)2,
W= | 12S |
m2(n3-n) |
.
If the test statistic W is 1, then all the judges or survey respondents have been unanimous, and each judge or respondent has assigned the same order to the list of objects or concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various judges or respondents.
Kendall and Gibbons (1990) also show W is linearly related to the mean value of the Spearman's rank correlation coefficients between all
m\choose{2}
\bar{r}s=
mW-1 | |
m-1 |
When the judges evaluate only some subset of the n objects, and when the correspondent block design is a (n, m, r, p, λ)-design (note the different notation). In other words, when
p<n
λ\ge1
Then Kendall's W is defined as [2]
W= |
| ||||||||||||||
λ2n(n2-1) |
.
If
p=n
λ=r=m
When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. For example, the data set has values of 80 tied for 4th, 5th, and 6th place; since the mean of = 5, ranks would be assigned to the raw data values as follows: .
The effect of ties is to reduce the value of W; however, this effect is small unless there are a large number of ties. To correct for ties, assign ranks to tied values as above and compute the correction factors
Tj=\sum
gj | |
i=1 |
3-t | |
(t | |
i), |
With the correction for ties, the formula for W becomes
W= |
| |||||||||||||||
|
,
m | |
\sum | |
j=1 |
(Tj)
In some cases, the importance of the raters (experts) might not be the same as each other. In this case, the Weighted Kendall's W should be used.[4] Suppose that object
i
rij
j
n
m
j
\varthetaj
\varthetaj(j=1,2,...,m)
i
Ri=\sum
m | |
j=1 |
\varthetajrij
\barR=
1 | |
n |
n | |
\sum | |
i=1 |
Ri
S
n | |
S=\sum | |
i=1 |
(Ri-\barR)2
Ww=
12S | |
(n3-n) |
In case of tie rank, we need to consider it in the above formula. To correct for ties, we should compute the correction factors,
Tj=\sum
n | |
i=1 |
3-t | |
(t | |
ij |
) \forallj
tij
j
i
Tj
j
Ww=
12S | |||||||||
|
In the case of complete ranks, a commonly used significance test for W against a null hypothesis of no agreement (i.e. random rankings) is given by Kendall and Gibbons (1990)[5]
\chi2=m(n-1)W
Where the test statistic takes a chi-squared distribution with
df=n-1
In the case of incomplete rankings (see above), this becomes
\chi2=
λ(n2-1) | |
k+1 |
W
Where again, there are
df=n-1
Legendre[6] compared via simulation the power of the chi-square and permutation testing approaches to determining significance for Kendall's W. Results indicated the chi-square method was overly conservative compared to a permutation test when
m<20
F= | W(m-1) |
1-W |
Where the test statistic follows an F distribution with
v1=n-1-(2/m)
v2=(m-1)v1
m
Kendall's W and Weighted Kendall's W are implemented in MATLAB,[8] SPSS, R,[9] and other statistical software packages.