The Newman–Keuls or Student–Newman–Keuls (SNK) method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other.[1] It was named after Student (1927),[2] D. Newman,[3] and M. Keuls.[4] This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics.[5] [6] Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.[6] [7]
The Newman–Keuls method was introduced by Newman in 1939 and developed further by Keuls in 1952. This was before Tukey presented various definitions of error rates (1952a,[8] 1952b,[9] 1953[10]).The Newman–Keuls method controls the Family-Wise Error Rate (FWER) in the weak sense but not the strong sense:[11] [12] the Newman–Keuls procedure controls the risk of rejecting the null hypothesis if all means are equal (global null hypothesis) but does not control the risk of rejecting partial null hypotheses. For instance, when four means are compared, under the partial null hypothesis that μ1=μ2 and μ3=μ4=μ+delta with a non-zero delta, the Newman–Keuls procedure has a probability greater than alpha of rejecting μ1=μ2 or μ3=μ4 or both. In that example, if delta is very large, the Newman–Keuls procedure is almost equivalent to two Student t tests testing μ1=μ2 and μ3=μ4 at nominal type I error rate alpha, without multiple testing procedure; therefore the FWER is almost doubled. In the worst case, the FWER of Newman–Keuls procedure is 1-(1-alpha)^int(J/2) where int(J/2) represents the integer part of the total number of groups divided by 2. Therefore, with two or three groups, the Newman–Keuls procedure has strong control over the FWER but not for four groups or more. In 1995 Benjamini and Hochberg presented a new, more liberal, and more powerful criterion for those types of problems: False discovery rate (FDR) control.[13] In 2006, Shaffer showed (by extensive simulation) that the Newman–Keuls method controls the FDR with some constraints.[14]
The assumptions of the Newman–Keuls test are essentially the same as for an independent groups t-test: normality, homogeneity of variance, and independent observations. The test is quite robust to violations of normality. Violating homogeneity of variance can be more problematic than in the two-sample case since the MSE is based on data from all groups. The assumption of independence of observations is important and should not be violated.
The Newman–Keuls method employs a stepwise approach when comparing sample means.[15] Prior to any mean comparison, all sample means are rank-ordered in ascending or descending order, thereby producing an ordered range (p) of sample means. A comparison is then made between the largest and smallest sample means within the largest range. Assuming that the largest range is four means (or p = 4), a significant difference between the largest and smallest means as revealed by the Newman–Keuls method would result in a rejection of the null hypothesis for that specific range of means. The next largest comparison of two sample means would then be made within a smaller range of three means (or p = 3). Unless there is no significant differences between two sample means within any given range, this stepwise comparison of sample means will continue until a final comparison is made with the smallest range of just two means. If there is no significant difference between the two sample means, then all the null hypotheses within that range would be retained and no further comparisons within smaller ranges are necessary.
\bar{X}1 | \bar{X}2 | \bar{X}3 | \bar{X}4 | |
---|---|---|---|---|
Mean values | 2 | 4 | 6 | 8 |
\bar{X}1= | 2 | 4 | 6 | |
\bar{X}2= | 2 | 4 | ||
\bar{X}3= | 2 | |||
q=
\bar{X | |
A |
-
\bar{X} | ||||
|
where
q
\bar{X}A
\bar{X}B
MSE
n
{nA} ≠ {nB}
q=
\bar{X | |
A |
-
\bar{X} | ( | ||||
|
1 | |
nA |
+
1 | |
nB |
)},
where
nA
nB
Once calculated, the computed q value can be compared to a q critical value (or
q\alpha\nup
\alpha
\nu
p
The Newman–Keuls procedure cannot produce a confidence interval for each mean difference, or for multiplicity adjusted exact p-values due to its sequential nature. Results are somewhat difficult to interpret since it is difficult to articulate what are the null hypotheses that were tested.