Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.[1]
The burden of proof rests on the demonstrable application of the statistical method, the disclosure of the assumptions, and the relevance that the test has with respect to a genuine understanding of the data relative to the external world. There are adherents to several different statistical philosophies of inference, such as Bayes theorem versus the likelihood function, or positivism versus critical rationalism. These methods of reason have direct bearing on statistical proof and its interpretations in the broader philosophy of science.[2]
A common demarcation between science and non-science is the hypothetico-deductive proof of falsification developed by Karl Popper, which is a well-established practice in the tradition of statistics. Other modes of inference, however, may include the inductive and abductive modes of proof.[3] Scientists do not use statistical proof as a means to attain certainty, but to falsify claims and explain theory. Science cannot achieve absolute certainty nor is it a continuous march toward an objective truth as the vernacular as opposed to the scientific meaning of the term "proof" might imply. Statistical proof offers a kind of proof of a theory's falsity and the means to learn heuristically through repeated statistical trials and experimental error. Statistical proof also has applications in legal matters with implications for the legal burden of proof.[4]
There are two kinds of axioms, 1) conventions that are taken as true that should be avoided because they cannot be tested, and 2) hypotheses.[5] Proof in the theory of probability was built on four axioms developed in the late 17th century:
\{\Pr(h)\geqq0\}
\{\Pr(t)=1\}
\{\Pr\left(h1\right)+\Pr\left(h2\right)=\Pr\left(h1orh2\right)\}
\{\Pr(h1|h2)\}
\{\Pr(h1\Andh2)\}
\{\Pr(h2)\}
\{\Pr(h1|h2)=
\Pr(h1\Andh2) | |
\Pr(h2) |
\}
\{\Pr(h2)>0\}
The preceding axioms provide the statistical proof and basis for the laws of randomness, or objective chance from where modern statistical theory has advanced. Experimental data, however, can never prove that the hypotheses (h) is true, but relies on an inductive inference by measuring the probability of the hypotheses relative to the empirical data. The proof is in the rational demonstration of using the logic of inference, math, testing, and deductive reasoning of significance.[6]
See main article: Statistical tests.
The term proof descended from its Latin roots (provable, probable, probare L.) meaning to test.[7] [8] Hence, proof is a form of inference by means of a statistical test. Statistical tests are formulated on models that generate probability distributions. Examples of probability distributions might include the binary, normal, or poisson distribution that give exact descriptions of variables that behave according to natural laws of random chance. When a statistical test is applied to samples of a population, the test determines if the sample statistics are significantly different from the assumed null-model. True values of a population, which are unknowable in practice, are called parameters of the population. Researchers sample from populations, which provide estimates of the parameters, to calculate the mean or standard deviation. If the entire population is sampled, then the sample statistic mean and distribution will converge with the parametric distribution.[9]
Using the scientific method of falsification, the probability value that the sample statistic is sufficiently different from the null-model than can be explained by chance alone is given prior to the test. Most statisticians set the prior probability value at 0.05 or 0.1, which means if the sample statistics diverge from the parametric model more than 5 (or 10) times out of 100, then the discrepancy is unlikely to be explained by chance alone and the null-hypothesis is rejected. Statistical models provide exact outcomes of the parametric and estimates of the sample statistics. Hence, the burden of proof rests in the sample statistics that provide estimates of a statistical model. Statistical models contain the mathematical proof of the parametric values and their probability distributions.[10] [11]
See also: Evidence under Bayes theorem.
Bayesian statistics are based on a different philosophical approach for proof of inference. The mathematical formula for Bayes's theorem is:
Pr[Parameter|Data]=
Pr[Data|Parameter] x Pr[Parameter] | |
Pr[Data] |
The formula is read as the probability of the parameter (or hypothesis =h, as used in the notation on axioms) “given” the data (or empirical observation), where the horizontal bar refers to "given". The right hand side of the formula calculates the prior probability of a statistical model (Pr [Parameter]) with the likelihood (Pr [Data | Parameter]) to produce a posterior probability distribution of the parameter (Pr [Parameter | Data]). The posterior probability is the likelihood that the parameter is correct given the observed data or samples statistics.[12] Hypotheses can be compared using Bayesian inference by means of the Bayes factor, which is the ratio of the posterior odds to the prior odds. It provides a measure of the data and if it has increased or decreased the likelihood of one hypothesis relative to another.[13]
The statistical proof is the Bayesian demonstration that one hypothesis has a higher (weak, strong, positive) likelihood. There is considerable debate if the Bayesian method aligns with Karl Poppers method of proof of falsification, where some have suggested that "...there is no such thing as "accepting" hypotheses at all. All that one does in science is assign degrees of belief..."[14] According to Popper, hypotheses that have withstood testing and have yet to be falsified are not verified but corroborated. Some researches have suggested that Popper's quest to define corroboration on the premise of probability put his philosophy in line with the Bayesian approach. In this context, the likelihood of one hypothesis relative to another may be an index of corroboration, not confirmation, and thus statistically proven through rigorous objective standing.[15]
See main article: Legal burden of proof.
Statistical proof in a legal proceeding can be sorted into three categories of evidence:
Statistical proof was not regularly applied in decisions concerning United States legal proceedings until the mid 1970s following a landmark jury discrimination case in Castaneda v. Partida. The US Supreme Court ruled that gross statistical disparities constitutes "prima facie proof" of discrimination, resulting in a shift of the burden of proof from plaintiff to defendant. Since that ruling, statistical proof has been used in many other cases on inequality, discrimination, and DNA evidence.[17] [18] However, there is not a one-to-one correspondence between statistical proof and the legal burden of proof. "The Supreme Court has stated that the degrees of rigor required in the fact finding processes of law and science do not necessarily correspond."
In an example of a death row sentence (McCleskey v. Kemp) concerning racial discrimination, the petitioner, a black man named McCleskey was charged with the murder of a white police officer during a robbery. Expert testimony for McClesky introduced a statistical proof showing that "defendants charged with killing white victims were 4.3 times as likely to receive a death sentence as charged with killing blacks.".[19] Nonetheless, the statistics was insufficient "to prove that the decisionmakers in his case acted with discriminatory purpose." It was further argued that there were "inherent limitations of the statistical proof", because it did not refer to the specifics of the individual. Despite the statistical demonstration of an increased probability of discrimination, the legal burden of proof (it was argued) had to be examined on a case-by-case basis.