Tukey's range test should not be confused with Tukey mean-difference test.
Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSD (honestly significant difference) test,[1] is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.
The method was initially developed and introduced by John Tukey for use in Analysis of Variance (ANOVA), and usually has only been taught in connection with ANOVA. However, the studentized range distribution used to determine the level of significance of the differences considered in Tukey's test has vastly broader application: It is useful for researchers who have searched their collected data for remarkable differences between groups, but then cannot validly determine how significant their discovered stand-out difference is using standard statistical distributions used for other conventional statistical tests, for which the data must have been selected at random. Since when stand-out data is compared it was by definition not selected at random, but rather specifically chosen because it was extreme, it needs a different, stricter interpretation provided by the likely frequency and size of the studentized range; the modern practice of "data mining" is an example where it is used.
The test is named after John Tukey,[2] it compares all possible pairs of means, and is based on a studentized range distribution (this distribution is similar to the distribution of from the -test. See below).[3]
Tukey's test compares the means of every treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise comparisons
\mui-\muj ,
and identifies any difference between two means that is greater than the expected standard error. The confidence coefficient for the set, when all sample sizes are equal, is exactly
1-\alpha
\alpha~:~0\le\alpha\le1~.
1-\alpha~.
This test is often followed by the Compact Letter Display (CLD) statistical procedure to render the output of this test more transparent to non-statistician audiences.
Tukey's test is based on a formula very similar to that of the -test. In fact, Tukey's test is essentially a -test, except that it corrects for family-wise error rate.
The formula for Tukey's test is
qs=
\left|YA-YB\right| | |
SE |
,
where and are the two means being compared, and SE is the standard error for the sum of the means. The value is the sample's test statistic. (The notation means the absolute value of ; the magnitude of with the sign set to, regardless of the original sign of .)
This test statistic can then be compared to a value for the chosen significance level from a table of the studentized range distribution. If the value is larger than the critical value obtained from the distribution, the two means are said to be significantly different at level
Since the null hypothesis for Tukey's test states that all means being compared are from the same population (i.e.), the means should be normally distributed (according to the central limit theorem) with the same model standard deviation, estimated by the merged standard error,
SE ,
The Tukey method uses the studentized range distribution. Suppose that we take a sample of size from each of populations with the same normal distribution and suppose that
\bar{y}min
\bar{y}max
q\equiv
\overline{y | |
max |
-\overline{y}min }{ S\sqrt{2/n} }
This definition of the statistic given above is the basis of the critically significant value for discussed below, and is based on these three factors:
\alpha~
k~
df
The distribution of has been tabulated and appears in many textbooks on statistics. In some tables the distribution of has been tabulated without the
\sqrt{2 }
ptukey
) and a quantile function (qtukey
) The Tukey confidence limits for all pairwise comparisons with confidence coefficient of at least
\bar{y}i\bullet-\bar{y}j\bullet \pm
q \alpha ; k ; N-k | |
\sqrt{2 |
} \widehat{\sigma} | ||||
|
} : i, j=1,\ldots,k i ≠ j~.
Notice that the point estimator and the estimated variance are the same as those for a single pairwise comparison. The only difference between the confidence limits for simultaneous comparisons and those for a single comparison is the multiple of the estimated standard deviation.
Also note that the sample sizes must be equal when using the studentized range approach.
\widehat{\sigma}\varepsilon
\bar{y}i\bullet-\bar{y}j\bullet \pm
q \alpha ; k ; N-k | |
\sqrt{2 |
} \widehat{\sigma} | + | ||||
|
1 | |
nj |
}
where and are the sizes of groups and respectively. The degrees of freedom for the whole design is also applied.
Both ANOVA and Tukey–Kramer tests are based on the same assumptions. However, these two tests for groups (i.e.) may result in logical contradictions when even if the assumptions do hold.
It is possible to generate a set of pseudorandom samples of strictly negative measure such that hypothesis is rejected at significance level
1-\alpha>0.95
1-\alpha=0.975~.
Also occasionally described as "honestly", see e.g.
Morrison . S. . Sosnoff . J.J. . Heffernan . K.S. . Jae . S.Y. . Fernhall . B. . 2013 . Aging, hypertension and physiological tremor: The contribution of the cardioballistic impulse to tremorgenesis in older adults . . 326 . 1–2 . 68–74 . 10.1016/j.jns.2013.01.016. 23385002 .