Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. The problem of the disagreement between the two approaches was discussed in Harold Jeffreys' 1939 textbook;[1] it became known as Lindley's paradox after Dennis Lindley called the disagreement a paradox in a 1957 paper.[2]
Although referred to as a paradox, the differing results from the Bayesian and frequentist approaches can be explained as using them to answer fundamentally different questions, rather than actual disagreement between the two methods.
Nevertheless, for a large class of priors the differences between the frequentist and Bayesian approach are caused by keeping the significance level fixed: as even Lindley recognized, "the theory does not justify the practice of keeping the significance level fixed" and even "some computations by Prof. Pearson in the discussion to that paper emphasized how the significance level would have to change with the sample size, if the losses and prior probabilities were kept fixed". In fact, if the critical value increases with the sample size suitably fast, then the disagreement between the frequentist and Bayesian approaches becomes negligible as the sample size increases.[3]
The paradox continues to be a source of active discussion.[4] [5] [6]
The result
x
H0
H1
\pi
x
Lindley's paradox occurs when
x
H0,
H0,
H0
x
H0
x
H1.
These results can occur at the same time when
H0
H1
The following numerical example illustrates Lindley's paradox. In a certain city 49,581 boys and 48,870 girls have been born over a certain time period. The observed proportion
x
\theta.
\theta
H0:\theta=0.5,
H1:\theta ≠ 0.5.
The frequentist approach to testing
H0
x
H0
X\simN(\mu,\sigma2),
\mu=np=n\theta=98451 x 0.5=49225.5
\sigma2=n\theta(1-\theta)=98451 x 0.5 x 0.5=24612.75,
\begin{align} P(X\geqx\mid\mu=49225.5)=
98451 | |
\int | |
x=49581 |
1 | |
\sqrt{2\pi\sigma2 |
We would have been equally surprised if we had seen female births, i.e.
x ≈ 0.4964,
p ≈ 2 x 0.0117=0.0235.
H0,
Assuming no reason to favor one hypothesis over the other, the Bayesian approach would be to assign prior probabilities
\pi(H0)=\pi(H1)=0.5
\theta
H1,
H0
P(H0\midk)=
P(k\midH0)\pi(H0) | |
P(k\midH0)\pi(H0)+P(k\midH1)\pi(H1) |
.
After observing
k=49581
n=98451
\begin{align} P(k\midH0)&={n\choosek}(0.5)k(1-0.5)n-k ≈ 1.95 x 10-4,\\ P(k\midH1)&=
1 | |
\int | |
0 |
{n\choosek}\thetak(1-\theta)n-kd\theta={n\choosek}\operatorname{\Beta}(k+1,n-k+1)=1/(n+1) ≈ 1.02 x 10-5, \end{align}
\operatorname{\Beta}(a,b)
From these values, we find the posterior probability of
P(H0\midk) ≈ 0.95,
H0
H1
The two approaches—the Bayesian and the frequentist—appear to be in conflict, and this is the "paradox".
Naaman proposed an adaption of the significance level to the sample size in order to control false positives:, such that with .At least in the numerical example, taking, results in a significance level of 0.00318, so the frequentist would not reject the null hypothesis, which is in agreement with the Bayesian approach.
If we use an uninformative prior and test a hypothesis more similar to that in the frequentist approach, the paradox disappears.
For example, if we calculate the posterior distribution
P(\theta\midx,n)
\theta
\pi(\theta\in[0,1])=1
P(\theta\midk,n)=\operatorname{\Beta}(k+1,n-k+1).
If we use this to check the probability that a newborn is more likely to be a boy than a girl, i.e.
P(\theta>0.5\midk,n),
1 | |
\int | |
0.5 |
\operatorname{\Beta}(49582,48871) ≈ 0.983.
In other words, it is very likely that the proportion of male births is above 0.5.
Neither analysis gives an estimate of the effect size, directly, but both could be used to determine, for instance, if the fraction of boy births is likely to be above some particular threshold.
The apparent disagreement between the two approaches is caused by a combination of factors. First, the frequentist approach above tests
H0
H1
H0
H1
\theta
[0,1]
H0
\theta ≈ 0.500
H1
\theta
\theta
H1
H0
H0
H1.
The ratio of the sex of newborns is improbably 50/50 male/female, according to the frequentist test. Yet 50/50 is a better approximation than most, but not all, other ratios. The hypothesis
\theta ≈ 0.504
\theta ≈ 0.500.
For example, this choice of hypotheses and prior probabilities implies the statement "if
\theta
\theta
\theta
\theta=0.5,
H0
x ≈ 0.5036,
x
2.28\sigma
H0
Looking at it another way, we can see that the prior distribution is essentially flat with a delta function at
\theta=0.5.
P(\theta=0.5)=0.
A more realistic distribution for
\theta
H0.
H1
H2:\theta=x,
\theta,
H0
H2
. Harold Jeffreys. Theory of Probability. Oxford University Press. 1939. 924.