In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.
David B. Duncan developed this test as a modification of the Student–Newman–Keuls method that would have greater power. Duncan's MRT is especially protective against false negative (Type II) error at the expense of having a greater risk of making false positive (Type I) errors. Duncan's test is commonly used in agronomy and other agricultural research.
The result of the test is a set of subsets of means, where in each subset means have been found not to be significantly different from one another.
This test is often followed by the Compact Letter Display (CLD) methodology that renders the output of such test much more accessible to non-statistician audiences.
Assumptions:
1.A sample of observed means
m1,m2,...,mn
\mu1,\mu2,...,\mun
\sigma
sm
n2
Sm
| |||||||||||||
|
\chi2
n2
The exact definition of the test is:
The difference between any two means in a set of n means is significant provided the range of each and every subset which contains the given means is significant according to an
\alphap
\alphap=1-\gammap
\gammap=(1-\alpha)(p-1)
p
Exception: The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range.
The procedure consists of a series of pairwise comparisons between means. Each comparison is performed at a significance level
\alphap
\alphap
p-2
The tests are performed in the following order: the largest minus the smallest, the largest minus the second smallest, up to the largest minus the second largest; then the second largest minus the smallest, the second largest minus the second smallest, and so on, finishing with the second smallest minus the smallest.
With only one exception, given below, each difference is significant if it exceeds the corresponding shortest significant range; otherwise it is not significant. Where the shortest significant range is the significant studentized range, multiplied by the standard error.The shortest significant range will be designated as
R(p,\alpha)
p
An algorithm for performing the test is as follows:
1.Rank the sample means, largest to smallest. 2. For each
mi
mj
m(i-1)
mi-mj
\sigmam ⋅ R(p,\alpha)
P=i-j,\alpha=\alphap
mi-mj
(mj,mj+1,...,mI)
Duncan's multiple range test makes use of the studentized range distribution in order to determine critical values for comparisons between means. Note that different comparisons between means may differ by their significance levels- since the significance level is subject to the size of the subset of means in question.
Let us denote
Q | |
(p,\nu,\gamma(p,\alpha)) |
\gamma\alpha
\nu
r(p,\nu,\alpha)
If p=2
r(p,\nu,\alpha)=
Q | |
(p,\nu,\gamma(p,\alpha)) |
r(p,\nu,\alpha)=max(
Q | |
(p,\nu,\gamma(p,\alpha)) |
,r(p-1,\nu,\alpha))
The shortest critical range, (the actual critical value of the test) is computed as :
R(p,\nu,\alpha)=\sigmam ⋅ r(p,\nu,\alpha)
\nu
Let us look at the example of 5 treatment means:
Treatments | T1 | T2 | T3 | T4 | T5 | |
---|---|---|---|---|---|---|
Treatment Means | 9.8 | 15.4 | 17.6 | 21.6 | 10.8 | |
Rank | 5 | 3 | 2 | 1 | 4 |
sm=1.796
\nu=20
r(p,\nu,\alpha)
r(2,20,0.05)=2.95
r(3,20,0.05)=3.10
r(4,20,0.05)=3.18
r(5,20,0.05)=3.25
Now we may obtain the values of the shortest significant range, by the formula:
R(p,\nu,\alpha)=\sigmam*r(p,\nu,\alpha)
Reaching:
R(2,20,0.05)=3.75
R(3,20,0.05)=3.94
R(4,20,0.05)=4.04
R(5,20,0.05)=4.13
Then, the observed differences between means are tested, beginning with the largest versus smallest, which would be compared with the least significant range
R(5,20,0.05)=4.13.
R(4,20,0.05)=4.04
4vs.1:21.6-9.8=11.8>4.13(R5)
4vs.5:21.6-10.8=10.8>4.04(R4)
4vs.2:21.6-15.4=6.2>3.94(R3)
4vs.3:21.6-17.6=4.0>3.75(R2)
3vs.1:17.6-9.8=7.8>4.04(R4)
3vs.5:17.6-10.8=6.8>3.94(R3)
3vs.2:17.6-15.4=2.2<3.75(R2)
2vs.1:15.4-9.8=5.6>3.94(R3)
2vs.5:15.4-10.8=4.6>3.75(R2)
5vs.1:10.8-9.8=1.0<3.75(R2)
The new multiple range test proposed by Duncan makes use of special protection levels based upon degrees of freedom. Let
\gamma2,\alpha={1-\alpha}
\gamma2,\alpha={1-\alpha}
\gammap,\alpha=
p-1 | |
\gamma | |
2,\alpha |
=(1-\alpha)p-1
\alphap=1-\gammap
that is, the probability that one finds no significant differences in making p-1 independent tests, each at protection level
\gamma2,\alpha={1-\alpha}
p-1 | |
\gamma | |
2,\alpha |
\alphap
For
\alpha=0.05
Protection level :\gammap,\alpha | probability of falsely rejecting H0:\alphap | |
---|---|---|
p=2 | 0.95 | 0.05 |
p=3 | 0.903 | 0.097 |
p=4 | 0.857 | 0.143 |
p=5 | 0.815 | 0.185 |
p=6 | 0.774 | 0.226 |
p=7 | 0.735 | 0.265 |
Note that although this procedure makes use of the Studentized range, his error rate is neither on an experiment-wise basis (as with Tukey's) nor on a per- comparisons basis. Duncan's multiple range test does not control the family-wise error rate. See Criticism Section for further details.
Duncan (1965) also gave the first Bayesian multiple comparison procedure, for the pairwise comparisons among the means in a one-way layout.This multiple comparison procedure is different for the one discussed above.
Duncan's Bayesian MCP discusses the differences between ordered group means, where the statistics in question are pairwise comparison (no equivalent is defined for the property of a subset having 'significantly different' property).
Duncan modeled the consequences of two or more means being equal using additive loss functions within and across the pairwise comparisons. If one assumes the same loss function across the pairwise comparisons, one needs to specify only one constant K, and this indicates the relative seriousness of type I to type II errors in each pairwise comparison.
A study, which performed by Juliet Popper Shaffer (1998), has shown that the method proposed by Duncan, modified to provide weak control of FWE and using an empirical estimate of the variance of the population means, has good properties both from the Bayesian point of view, as a minimum- risk method, and from the frequentist point of view, with good average power.
In addition, results indicate considerable similarity in both risk and average power between Duncan's modified procedure and the Benjamini and Hochberg (1995) False discovery rate -controlling procedure, with the same weak family-wise error control.
Duncan's test has been criticised as being too liberal by many statisticians including Henry Scheffé, and John W. Tukey. Duncan argued that a more liberal procedure was appropriate because in real world practice the global null hypothesis H0 = "All means are equal" is often false and thus traditional statisticians overprotect a probably false null hypothesis against type I errors. According to Duncan, one should adjust the protection levels for different p-mean comparisons according to the problem discussed. The example discussed by Duncan in his 1955 paper is of a comparison of many means (i.e. 100), when one is interested only in two-mean and three-mean comparisons, and general p-mean comparisons (deciding whether there is some difference between p-means) are of no special interest (if p is 15 or more for example).Duncan's multiple range test is very “liberal” in terms of Type I errors. The following example will illustrate why:
Let us assume one is truly interested, as Duncan suggested, only with the correct ranking of subsets of size 4 or below. Let us also assume that one performs the simple pairwise comparison with a protection level
\gamma2=0.95
There are
100\choose2
1-0.95=0.05
There are
100\choose3
1-(0.95)2=0.097
There are
100\choose4
1-(0.95)3=0.143
As we can see, the test has two main problems, regarding the type I errors:
\alphap\geq\alpha
Therefore, it is advised not to use the procedure discussed.
Duncan later developed the Duncan–Waller test which is based on Bayesian principles. It uses the obtained value of F to estimate the prior probability of the null hypothesis being true.
If one still wishes to address the problem of finding similar subsets of group means, other solutions are found in literature.
Tukey's range test is commonly used to compare pairs of means, this procedure controls the family-wise error rate in the strong sense.
Another solution is to perform Student's t-test of all pairs of means, and then to use FDR Controlling procedure (to control the expected proportion of incorrectly rejected null hypotheses).
Other possible solutions, which do not include hypothesis testing, but result in a partition of subsets include Clustering & Hierarchical Clustering. These solutions differ from the approach presented in this method: