All models are wrong is a common aphorism and anapodoton in statistics; it is often expanded as "All models are wrong, but some are useful". The aphorism acknowledges that statistical models always fall short of the complexities of reality but can still be useful nonetheless. The aphorism originally referred just to statistical models, but it is now sometimes used for scientific models in general.[1]
The aphorism is generally attributed to George E. P. Box, a British statistician, although the underlying concept predates Box's writings.
The first record of Box saying "all models are wrong" is in a 1976 paper published in the Journal of the American Statistical Association.[2] The 1976 paper contains the aphorism twice. The two sections of the paper that contain the aphorism state:
Box repeated the aphorism in a paper that was published in the proceedings of a 1978 statistics workshop. The paper contains a section entitled "All models are wrong but some are useful". The section states (p 202-3):
Box repeated the aphorism twice more in his 1987 book, Empirical Model-Building and Response Surfaces (which was co-authored with Norman Draper).[3] The first repetition is on p. 74: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." The second repetition is on p. 424, which is excerpted below.A second edition of the book was published in 2007, under the title Response Surfaces, Mixtures, and Ridge Analyses. The second edition also repeats the aphorism twice, in contexts identical with those of the first edition (on p. 63 and p. 414).[4]
Box repeated the aphorism two more times in his 1997 book, Statistical Control: By Monitoring and Feedback Adjustment (which was co-authored with Alberto Luceño).[5] The first repetition is on p. 6, which is excerpted below.The second repetition is on p. 9: "So since all models are wrong, it is very important to know what to worry about;or, to put it in another way, what models are likely to produce procedures that work in practice (where exact assumptions are never true)".
A second edition of the book was published in 2009, under the title Statistical Control By Monitoring and Adjustment (co-authored with Alberto Luceño and María del Carmen Paniagua-Quiñones). The second edition also repeats the aphorism two times.[6] The first repetition is on p. 61, which is excerpted below.The second repetition is on p. 63; its context is essentially the same as that of the second repetition in the first edition.
Box's widely cited book Statistics for Experimenters (co-authored with William Hunter) does not include the aphorism in its first edition (published in 1978).[7] The second edition (published in 2005; co-authored with William Hunter and J. Stuart Hunter) includes the aphorism three times: on p. 208, p. 384, and p. 440.[8] On p. 440, the relevant sentence is this: "The most that can be expected from any model is that it can supply a useful approximation to reality: All models are wrong; some models are useful".
In addition to stating the aphorism verbatim, Box sometimes stated the essence of the aphorism with different words. One example is from 1978, while Box was President of the American Statistical Association. At the annual meeting of the Association, Box delivered his Presidential Address, wherein he stated this: "Models, of course, are never true, but fortunately it is only necessary that they be useful".[9]
There have been varied discussions about the aphorism. A selection from those discussions is presented below.
In 1983, the statisticians Peter McCullagh and John Nelder published their much-cited book on generalized linear models. The book includes a brief discussion of the aphorism (though without citing Box).[10] A second edition of the book, published in 1989, contains a very similar discussion of the aphorism.[11] The discussion from the first edition is as follows.
In 1995, the statistician David Cox commented as follows.[12]
In 1996, an Applied Statistician's Creed was proposed by M.R.Nester.[13] The creed includes, in its core part, the aphorism.
In 2002, K. P. Burnham and D. R. Anderson published their much-cited book on statistical model selection. The book states the following.[14]
The statistician J. Michael Steele has commented on the aphorism as follows.[15]
In 2008, the statistician Andrew Gelman responded to that, saying in particular the following.[16]
In 2013, the philosopher of science Peter Truran published an essay related to the aphorism.[17] The essay notes, in particular, the following.
Truran's essay further notes that Newton's theory of gravitation has been supplanted by Einstein's theory of relativity and yet Newton's theory remains generally "empirically adequate". Indeed, Newton's theory generally has excellent predictive power. Yet Newton's theory is not an approximation of Einstein's theory. For illustration, consider an apple falling down from a tree. Under Newton's theory, the apple falls because Earth exerts a force on the apple—what is called "the force of gravity". Under Einstein's theory, Earth does not exert any force on the apple.[18] Hence, Newton's theory might be regarded as being, in some sense, completely wrong but extremely useful. (The usefulness of Newton's theory comes partly from being vastly simpler, both mathematically and computationally, than Einstein's theory.)
In 2014, the statistician David Hand made the following statement.[19]
In 2016, P. J. Bickel and K. A. Doksum published the second volume of their book on mathematical statistics. The volume includes the quote from Box's Presidential Address, given above. It states that the quote is the best formulation of the "guiding principle of modern statistics".[20]
Although the aphorism seems to have originated with George Box, the underlying concept goes back decades, perhaps centuries. Some exemplifications of that are given below.
In 1960, Georg Rasch said the following.
In 1947, the mathematician John von Neumann said that "truth ... is much too complicated to allow anything but approximations".[21]
In 1942, the French philosopher-poet Paul Valéry said the following.[22]
In 1939, the founder of statistical process control, Walter Shewhart, said the following.[23]
In 1923, a related idea was articulated by the artist Pablo Picasso.