In statistics, Barnard’s test is an exact test used in the analysis of contingency tables with one margin fixed. Barnard’s tests are really a class of hypothesis tests, also known as unconditional exact tests for two independent binomials.[1] [2] [3] These tests examine the association of two categorical variables and are often a more powerful alternative than Fisher's exact test for contingency tables. While first published in 1945 by G.A. Barnard,[4] [5] the test did not gain popularity due to the computational difficulty of calculating the value and Fisher’s specious disapproval. Nowadays, even for sample sizes n ~ 1 million, computers can often implement Barnard’s test in a few seconds or less.
Barnard’s test is used to test the independence of rows and columns in a contingency table. The test assumes each response is independent. Under independence, there are three types of study designs that yield a table, and Barnard's test applies to the second type.
To distinguish the different types of designs, suppose a researcher is interested in testing whether a treatment quickly heals an infection.
Although the results of each design of experiment can be laid out in nearly identical-appearing tables, their statistics are different, and hence the criteria for a "significant" result are different for each:
This kind of experiment is complicated to manage, and is almost unknown in practical experiments.
The operational difference between Barnard’s exact test and Fisher’s ‘exact’ test is how they handle the nuisance parameter(s) of the common success probability, when calculating the value. Fisher's exact test avoids estimating the nuisance parameter(s) by conditioning on both margins, an approximately ancillary statistic that constrains the possible outcomes. The problem with that Fisher's procedure is that it falsely excludes some of the outcomes which are indeed possiblities for almost all types of experiments. Barnard’s test is better, in that it considers all legitimate possible values of the nuisance parameter(s) and chooses the value(s) that maximizes the value. The theoretical difference between the tests is that Barnard’s test uses the double-binomially distributed, whereas Fisher’s test, because of the (usually false) conditioning uses is the hypergeometric distribution, which means that the estimated values it produces are not correct; in general they are too large, making Fisher's test too 'conservative': Prone to unnecessary type II errors (excessive numbers of false negatives). However, even when the data come from double-binomial distribution, the conditioning (that leads to using the hypergeometric distribution for calculating the Fisher's exact value) produces a valid test, if one accepts that Fisher's test will necessarily miss some positive results.[3] Barnard's test is not biased in this way, and is more suitable for a broader range of experiment types, including those which are most common, in which there is no experimental constraint on one of either the row sum or the column sum of the table.
Both tests bound the type I error rate at the level, and hence are technically 'valid'. However, for the design of almost all actually conducted experiments Barnard’s test is much more powerful than Fisher’s test, because it considers more ‘as or more extreme’ tables, by not imposing a false constraint ('conditioning') on the second margin, which the procedure for Fisher’s test requires (incorrectly so, with the exception of a few rarely-used experimental designs, where the conditioning for Fisher's test is valid). In fact, a variant of Barnard’s test, called Boschloo's test, is uniformly more powerful than Fisher’s test.[6] Barnard’s test has been used alongside Fisher's exact test in project management research.[7]
Under specious pressure from Fisher, Barnard retracted his test in a published paper,[8] however many researchers prefer Barnard’s exact test over Fisher's exact test for analyzing contingency tables,[9] since its statistics are more powerful for the vast majority of experimental designs, whereas Fisher’s exact test statistics are conservative, meaning the significance shown by its values are too high, leading the experimenter to dismiss as insignificant results that would be statistically significant using the correct (and less conservative) double-binomial statistics of Barnard's tests rather than the almost-always invalid (and excessively conservative) hypergeometric statistics of Fisher's 'exact' test. Barnard's tests are not appropriate in the rare case of an experimental design that constrains both marginal results (e.g. ‘taste tests’); although rare, experimentally imposed constraints on both marginal totals make the true sampling distribution for the table hypergeometric.
Barnard's test can be applied to larger tables, but the computation time increases and the power advantage quickly decreases.[10] It remains unclear which test statistic is preferred when implementing Barnard's test; however, most test statistics yield uniformly more powerful tests than Fisher's exact test.[11]