Bayesian hierarchical modelling is a statistical model written in multiple levels (hierarchical form) that estimates the parameters of the posterior distribution using the Bayesian method.[1] The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.
Frequentist statistics may yield conclusions seemingly incompatible with those offered by Bayesian statistics due to the Bayesian treatment of the parameters as random variables and its use of subjective information in establishing assumptions on these parameters.[2] As the approaches answer different questions the formal results aren't technically contradictory but the two approaches disagree over which answer is relevant to particular applications. Bayesians argue that relevant information regarding decision-making and updating beliefs cannot be ignored and that hierarchical modeling has the potential to overrule classical methods in applications where respondents give multiple observational data. Moreover, the model has proven to be robust, with the posterior distribution less sensitive to the more flexible hierarchical priors.
Hierarchical modeling is used when information is available on several different levels of observational units. For example, in epidemiological modeling to describe infection trajectories for multiple countries, observational units are countries, and each country has its own temporal profile of daily infected cases.[3] In decline curve analysis to describe oil or gas production decline curve for multiple wells, observational units are oil or gas wells in a reservoir region, and each well has each own temporal profile of oil or gas production rates (usually, barrels per month).[4] Data structure for the hierarchical modeling retains nested data structure. The hierarchical form of analysis and organization helps in the understanding of multiparameter problems and also plays an important role in developing computational strategies.
Statistical methods and models commonly involve multiple parameters that can be regarded as related or connected in such a way that the problem implies a dependence of the joint probability model for these parameters.Individual degrees of belief, expressed in the form of probabilities, come with uncertainty.[5] Amidst this is the change of the degrees of belief over time. As was stated by Professor José M. Bernardo and Professor Adrian F. Smith, “The actuality of the learning process consists in the evolution of individual and subjective beliefs about the reality.” These subjective probabilities are more directly involved in the mind rather than the physical probabilities.[5] Hence, it is with this need of updating beliefs that Bayesians have formulated an alternative statistical model which takes into account the prior occurrence of a particular event.[6]
The assumed occurrence of a real-world event will typically modify preferences between certain options. This is done by modifying the degrees of belief attached, by an individual, to the events defining the options.
Suppose in a study of the effectiveness of cardiac treatments, with the patients in hospital j having survival probability
\thetaj
In order to make updated probability statements about
\thetaj
\thetaj
P(\theta)
P(y\mid\theta)
P(\theta,y)=P(\theta)P(y\mid\theta)
Using the basic property of conditional probability, the posterior distribution will yield:
P(\theta\midy)=
P(\theta,y) | |
P(y) |
=
P(y\mid\theta)P(\theta) | |
P(y) |
This equation, showing the relationship between the conditional probability and the individual events, is known as Bayes' theorem. This simple expression encapsulates the technical core of Bayesian inference which aims to incorporate the updated belief,
P(\theta\midy)
The usual starting point of a statistical analysis is the assumption that the n values
y1,y2,\ldots,yn
\thetaj
\theta
P(\theta)
For a fixed number n, the set
y1,y2,\ldots,yn
P(y1,y2,\ldots,yn)
\pi
(\pi1,\pi2,\ldots,\pin)
P(y1,y2,\ldots,yn)=
P(y | |
\pi1 |
,
y | |
\pi2 |
,\ldots,
y | |
\pin |
).
Following is an exchangeable, but not independent and identical (iid), example:Consider an urn with a red ball and a blue ball inside, with probability
1 | |
2 |
Let Yi= \begin{cases} 1,&iftheithballisred,\\ 0,&otherwise. \end{cases}
Since the probability of selecting a red ball in the first draw and a blue ball in the second draw is equal to the probability of selecting a blue ball on the first draw and a red on the second draw, both of which are equal to 1/2 (i.e.
[P(y1=1,y2=0)=P(y1=0,y2=1)=
1 | |
2 |
]
y1
y2
But the probability of selecting a red ball on the second draw given that the red ball has already been selected in the first draw is 0, and is not equal to the probability that the red ball is selected in the second draw which is equal to 1/2 (i.e.
[P(y2=1\midy1=1)=0\neP(y2=1)=
1 | |
2 |
]
y1
y2
If
x1,\ldots,xn
Infinite exchangeability is the property that every finite subset of an infinite sequence
y1
y2,\ldots
y1,y2,\ldots,yn
Bayesian hierarchical modeling makes use of two important concepts in deriving the posterior distribution,[1] namely:
Suppose a random variable Y follows a normal distribution with parameter θ as the mean and 1 as the variance, that is
Y\mid\theta\simN(\theta,1)
\sim
\theta
\mu
\theta\mid\mu\simN(\mu,1)
\mu
N(0,1)
\mu
N(0,1)
Y\mid\theta,\mu\simN(\theta,1)
\mu
\beta
\epsilon
\mu\simN(\beta,\epsilon)
\beta
\epsilon
Let
yj
\thetaj
yj
\theta1,\theta2,\ldots,\thetaj
\phi
StageI:yj\mid\thetaj,\phi\simP(yj\mid\thetaj,\phi)
StageII:\thetaj\mid\phi\simP(\thetaj\mid\phi)
StageIII:\phi\simP(\phi)
The likelihood, as seen in stage I is
P(yj\mid\thetaj,\phi)
P(\thetaj,\phi)
\phi
\thetaj
The prior distribution from stage I can be broken down into:
P(\thetaj,\phi)=P(\thetaj\mid\phi)P(\phi)
With
\phi
P(\phi)
Thus, the posterior distribution is proportional to:
P(\phi,\thetaj\midy)\proptoP(yj\mid\thetaj,\phi)P(\thetaj,\phi)
P(\phi,\thetaj\midy)\proptoP(yj\mid\thetaj)P(\thetaj\mid\phi)P(\phi)
To further illustrate this, consider the example:A teacher wants to estimate how well a student did on the SAT. The teacher uses information on the student’s high school grades and current grade point average (GPA) to come up with an estimate. The student's current GPA, denoted by
Y
\theta
Y\mid\theta\simP(Y\mid\theta)
\theta
\phi
\theta\mid\phi\simP(\theta\mid\phi)
\phi
P(\phi)
P(\theta,\phi\midY)\proptoP(Y\mid\theta,\phi)P(\theta,\phi)
P(\theta,\phi\midY)\proptoP(Y\mid\theta)P(\theta\mid\phi)P(\phi)
All information in the problem will be used to solve for the posterior distribution. Instead of solving only using the prior distribution and the likelihood function, the use of hyperpriors gives more information to make more accurate beliefs in the behavior of a parameter.[10]
In general, the joint posterior distribution of interest in 2-stage hierarchical models is:
P(\theta,\phi\midY)={P(Y\mid\theta,\phi)P(\theta,\phi)\overP(Y)}={P(Y\mid\theta)P(\theta\mid\phi)P(\phi)\overP(Y)}
P(\theta,\phi\midY)\proptoP(Y\mid\theta)P(\theta\mid\phi)P(\phi)
For 3-stage hierarchical models, the posterior distribution is given by:
P(\theta,\phi,X\midY)={P(Y\mid\theta)P(\theta\mid\phi)P(\phi\midX)P(X)\overP(Y)}
P(\theta,\phi,X\midY)\proptoP(Y\mid\theta)P(\theta\mid\phi)P(\phi\midX)P(X)
The framework of Bayesian hierarchical modeling is frequently used in diverse applications. Particularly, Bayesian nonlinear mixed-effects models have recently received significant attention. A basic version of the Bayesian nonlinear mixed-effects models is represented as the following three-stage:
Stage 1: Individual-Level Model
{y}ij=f(tij;\theta1i,\theta2i,\ldots,\thetali,\ldots,\thetaKi)+\epsilonij, \epsilonij\simN(0,\sigma2), i=1,\ldots,N,j=1,\ldots,Mi.
Stage 2: Population Model
\thetali=\alphal+
P | |
\sum | |
b=1 |
\betalbxib+ηli, ηli\simN(0,
2), | |
\omega | |
l |
i=1,\ldots,N,l=1,\ldots,K.
Stage 3: Prior
\sigma2\sim\pi(\sigma2), \alphal\sim\pi(\alphal), (\betal1,\ldots,\betalb,\ldots,\betalP)\sim\pi(\betal1,\ldots,\betalb,\ldots,\betalP),
2 | |
\omega | |
l |
\sim
2), | |
\pi(\omega | |
l |
l=1,\ldots,K.
Here,
yij
i
tij
xib
b
i
f(t;\theta1,\ldots,\thetaK)
K
(\theta1,\ldots,\thetaK)
f
\epsilonij
ηli
A central task in the application of the Bayesian nonlinear mixed-effect models is to evaluate the posterior density:
\pi(\{\thetali
N,K | |
\} | |
i=1,l=1 |
,\sigma2,\{\alphal\}
K, | |
l=1 |
\{\betalb
K,P | |
\} | |
l=1,b=1 |
,\{\omegal\}
K | |
l=1 |
|\{yij
N,Mi | |
\} | |
i=1,j=1 |
)
\propto\pi(\{yij
N,Mi | |
\} | |
i=1,j=1 |
,\{\thetali
N,K | |
\} | |
i=1,l=1 |
,\sigma2,\{\alphal\}
K, | |
l=1 |
\{\betalb
K,P | |
\} | |
l=1,b=1 |
,\{\omegal\}
K) | |
l=1 |
=\underbrace{\pi(\{yij
N,Mi | |
\} | |
i=1,j=1 |
|\{\thetali
N,K | |
\} | |
i=1,l=1 |
2)} | |
,\sigma | |
Stage1:Individual-LevelModel |
x \underbrace{\pi(\{\thetali
N,K | |
\} | |
i=1,l=1 |
|\{\alphal\}
K, | |
l=1 |
\{\betalb
K,P | |
\} | |
l=1,b=1 |
,\{\omegal\}
K)} | |
Stage2:PopulationModel |
x \underbrace{p(\sigma2,\{\alphal\}
K, | |
l=1 |
\{\betalb
K,P | |
\} | |
l=1,b=1 |
,\{\omegal\}
K)} | |
Stage3:Prior |
The panel on the right displays Bayesian research cycle using Bayesian nonlinear mixed-effects model. A research cycle using the Bayesian nonlinear mixed-effects model comprises two steps: (a) standard research cycle and (b) Bayesian-specific workflow. Standard research cycle involves literature review, defining a problem and specifying the research question and hypothesis. Bayesian-specific workflow comprises three sub-steps: (b)–(i) formalizing prior distributions based on background knowledge and prior elicitation; (b)–(ii) determining the likelihood function based on a nonlinear function
f