Bayesian statistics explained

Bayesian statistics (or) is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials.^[1] More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

Bayesian statistical methods use Bayes' theorem to compute and update probabilities after obtaining new data. Bayes' theorem describes the conditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event.^[2] ^[3] For example, in Bayesian inference, Bayes' theorem can be used to estimate the parameters of a probability distribution or statistical model. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters.

Bayesian statistics is named after Thomas Bayes, who formulated a specific case of Bayes' theorem in a paper published in 1763. In several papers spanning from the late 18th to the early 19th centuries, Pierre-Simon Laplace developed the Bayesian interpretation of probability.^[4] Laplace used methods that would now be considered Bayesian to solve a number of statistical problems. Many Bayesian methods were developed by later authors, but the term was not commonly used to describe such methods until the 1950s. During much of the 20th century, Bayesian methods were viewed unfavorably by many statisticians due to philosophical and practical considerations. Many Bayesian methods required much computation to complete, and most methods that were widely used during the century were based on the frequentist interpretation. However, with the advent of powerful computers and new algorithms like Markov chain Monte Carlo, Bayesian methods have seen increasing use within statistics in the 21st century.^[5]

Bayes' theorem

See main article: Bayes' theorem. Bayes' theorem is used in Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events

and

, the conditional probability of

given that

is true is expressed as follows:^[6]

$P(A \mid B) = \frac$

where

P(B) ≠ 0

. Although Bayes' theorem is a fundamental result of probability theory, it has a specific interpretation in Bayesian statistics. In the above equation,

usually represents a proposition (such as the statement that a coin lands on heads fifty percent of the time) and

represents the evidence, or new data that is to be taken into account (such as the result of a series of coin flips).

P(A)

is the prior probability of

which expresses one's beliefs about

before evidence is taken into account. The prior probability may also quantify prior knowledge or information about

P(B\midA)

is the likelihood function, which can be interpreted as the probability of the evidence

given that

is true. The likelihood quantifies the extent to which the evidence

supports the proposition

P(A\midB)

is the posterior probability, the probability of the proposition

after taking the evidence

into account. Essentially, Bayes' theorem updates one's prior beliefs

P(A)

after considering the new evidence

The probability of the evidence

P(B)

can be calculated using the law of total probability. If

\{A_1,A_2,...,A_n\}

is a partition of the sample space, which is the set of all outcomes of an experiment, then,

$P(B) = P(B \mid A_1)P(A_1) + P(B \mid A_2)P(A_2) + \dots + P(B \mid A_n)P(A_n) = \sum_i P(B \mid A_i)P(A_i)$

When there are an infinite number of outcomes, it is necessary to integrate over all outcomes to calculate

P(B)

using the law of total probability. Often,

P(B)

is difficult to calculate as the calculation would involve sums or integrals that would be time-consuming to evaluate, so often only the product of the prior and likelihood is considered, since the evidence does not change in the same analysis. The posterior is proportional to this product:

$P(A \mid B) \propto P(B \mid A)P(A)$

The maximum a posteriori, which is the mode of the posterior and is often computed in Bayesian statistics using mathematical optimization methods, remains the same. The posterior can be approximated even without computing the exact value of

P(B)

with methods such as Markov chain Monte Carlo or variational Bayesian methods.

Bayesian methods

The general set of statistical techniques can be divided into a number of activities, many of which have special Bayesian versions.

Bayesian inference

See main article: Bayesian inference. Bayesian inference refers to statistical inference where uncertainty in inferences is quantified using probability.^[7] In classical frequentist inference, model parameters and hypotheses are considered to be fixed. Probabilities are not assigned to parameters or hypotheses in frequentist inference. For example, it would not make sense in frequentist inference to directly assign a probability to an event that can only happen once, such as the result of the next flip of a fair coin. However, it would make sense to state that the proportion of heads approaches one-half as the number of coin flips increases.^[8]

Statistical models specify a set of statistical assumptions and processes that represent how the sample data are generated. Statistical models have a number of parameters that can be modified. For example, a coin can be represented as samples from a Bernoulli distribution, which models two possible outcomes. The Bernoulli distribution has a single parameter equal to the probability of one outcome, which in most cases is the probability of landing on heads. Devising a good model for the data is central in Bayesian inference. In most cases, models only approximate the true process, and may not take into account certain factors influencing the data. In Bayesian inference, probabilities can be assigned to model parameters. Parameters can be represented as random variables. Bayesian inference uses Bayes' theorem to update probabilities after more evidence is obtained or known.^[9]

Statistical modeling

The formulation of statistical models using Bayesian statistics has the identifying feature of requiring the specification of prior distributions for any unknown parameters. Indeed, parameters of prior distributions may themselves have prior distributions, leading to Bayesian hierarchical modeling,^[10] ^[11] ^[12] also known as multi-level modeling. A special case is Bayesian networks.

For conducting a Bayesian statistical analysis, best practices are discussed by van de Schoot et al.^[13]

For reporting the results of a Bayesian statistical analysis, Bayesian analysis reporting guidelines (BARG) are provided in an open-access article by John K. Kruschke.^[14]

Design of experiments

The Bayesian design of experiments includes a concept called 'influence of prior beliefs'. This approach uses sequential analysis techniques to include the outcome of earlier experiments in the design of the next experiment. This is achieved by updating 'beliefs' through the use of prior and posterior distribution. This allows the design of experiments to make good use of resources of all types. An example of this is the multi-armed bandit problem.

Exploratory analysis of Bayesian models

Exploratory analysis of Bayesian models is an adaptation or extension of the exploratory data analysis approach to the needs and peculiarities of Bayesian modeling. In the words of Persi Diaconis:^[15]

The inference process generates a posterior distribution, which has a central role in Bayesian statistics, together with other distributions like the posterior predictive distribution and the prior predictive distribution. The correct visualization, analysis, and interpretation of these distributions is key to properly answer the questions that motivate the inference process.^[16]

When working with Bayesian models there are a series of related tasks that need to be addressed besides inference itself:

Diagnoses of the quality of the inference, this is needed when using numerical methods such as Markov chain Monte Carlo techniques
Model criticism, including evaluations of both model assumptions and model predictions
Comparison of models, including model selection or model averaging
Preparation of the results for a particular audience

All these tasks are part of the Exploratory analysis of Bayesian models approach and successfully performing them is central to the iterative and interactive modeling process. These tasks require both numerical and visual summaries.^[17] ^[18] ^[19]

External links

Web site: Theo Kypraios. A Gentle Tutorial in Bayesian Statistics. PDF. 2013-11-03.
Book: Jordi Vallverdu . Bayesians Versus Frequentists A Philosophical Debate on Statistical Reasoning .
Bayesian statistics David Spiegelhalter, Kenneth Rice Scholarpedia 4(8):5230.
Bayesian modeling book and examples available for downloading.
Web site: Rens van de Schoot . A Gentle Introduction to Bayesian Analysis .
Bayesian A/B Testing Calculator Dynamic Yield

Notes and References

Book: Gelman. Andrew. Bayesian Data Analysis . Third. Carlin. John B.. Stern. Hal S.. Dunson. David B.. Vehtari. Aki. Rubin. Donald B.. Chapman and Hall/CRC. 2013. 978-1-4398-4095-5. Andrew Gelman. John Carlin (professor). Donald Rubin.
Book: Statistical Rethinking : A Bayesian Course with Examples in R and Stan . 2nd . Chapman and Hall/CRC . 2020 . 978-0-367-13991-9 . McElreath. Richard. Richard McElreath.
Book: Kruschke, John . John K. Kruschke . Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan . Academic Press . 2nd . 2014 . 978-0-12-405888-0 .
Book: The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy . First . Chapman and Hall/CRC . 2012 . 978-0-3001-8822-6. McGrayne. Sharon. Richard McElreath.
Fienberg . Stephen E. . When Did Bayesian Inference Become "Bayesian"? . 2006. Bayesian Analysis. 1. 1. 1–40 . 10.1214/06-BA101 . free .
Book: Grinstead . Charles M. . Snell . J. Laurie . Introduction to probability . 2006 . American Mathematical Society . Providence, RI . 978-0-8218-9414-9 . 2nd.
Lee. Se Yoon. Gibbs sampler and coordinate ascent variational inference: A set-theoretical review. Communications in Statistics - Theory and Methods. 2021. 51 . 6 . 1549–1568 . 10.1080/03610926.2021.1921214. 2008.01006. 220935477 .
Book: Wakefield . Jon . Bayesian and frequentist regression methods . 2013 . Springer . New York, NY . 978-1-4419-0924-4.
Book: Congdon . Peter . Applied Bayesian modelling . 2014 . Wiley . 978-1119951513 . 2nd.
Book: Kruschke. J K. John K. Kruschke . Vanpaemel . W . Bayesian Estimation in Hierarchical Models . 279–299 . The Oxford Handbook of Computational and Mathematical Psychology . Busemeyer . J R . Wang . Z . Townsend . J T . Eidels . A . 2015 . Oxford University Press .
Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.
Lee. Se Yoon . Bani. Mallick. Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas. Sankhya B. 2021. 84 . 1–43 . 10.1007/s13571-020-00245-8.
van de Schoot . Rens. Depaoli . Sarah . King . Ruth . Kramer . Bianca . Märtens . Kaspar . Tadesse . Mahlet G. . Vannucci . Marina . Gelman . Andrew . Veen . Duco . Willemsen . Joukje . Yau . Christopher . Bayesian statistics and modelling. Nature Reviews Methods Primers. January 14, 2021. 1. 1. 1–26. 10.1038/s43586-020-00001-2. 1874/415909 . 234108684 . free .
Kruschke. J K. John K. Kruschke. Bayesian Analysis Reporting Guidelines. Nature Human Behaviour. Aug 16, 2021. 5. 10 . 1282–1291. 10.1038/s41562-021-01177-7. 34400814 . 8526359 .
Diaconis, Persi (2011) Theories of Data Analysis: From Magical Thinking Through Classical Statistics. John Wiley & Sons, Ltd 2:e55
10.21105/joss.01143 . ArviZ a unified library for exploratory analysis of Bayesian models in Python . 2019 . Kumar . Ravin . Carroll . Colin . Hartikainen . Ari . Martin . Osvaldo . Journal of Open Source Software . 4 . 33 . 1143 . 2019JOSS....4.1143K . free . 11336/114615 . free .
1709.01449 . 10.1111/rssa.12378 . Visualization in Bayesian workflow . 2019 . Gabry . Jonah . Simpson . Daniel . Vehtari . Aki . Betancourt . Michael . Gelman . Andrew . 26590874 . Journal of the Royal Statistical Society, Series A (Statistics in Society) . 182 . 2 . 389–402 .
1903.08008 . Vehtari . Aki . Gelman . Andrew . Simpson . Daniel . Carpenter . Bob . Bürkner . Paul-Christian . Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (With Discussion) . Bayesian Analysis . 2021 . 16 . 2 . 10.1214/20-BA1221 . 88522683 .
Book: Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ. Martin. Osvaldo. 2018. Packt Publishing Ltd. 9781789341652. en.

Bayesian statistics explained

Bayes' theorem

Bayesian methods

Bayesian inference

Statistical modeling

Design of experiments

Exploratory analysis of Bayesian models

See also

Further reading

External links

Notes and References