The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the posterior distribution is approximately multivariate normal.
Define the deviance as
D(\theta)=-2log(p(y|\theta))+C
y
\theta
p(y|\theta)
C
There are two calculations in common usage for the effective number of parameters of the model. The first, as described in, is
pD=\overline{D(\theta)}-D(\bar{\theta})
\bar{\theta}
\theta
pD=pV=
1 | |
2 |
\overline{\operatorname{var}\left(D(\theta)\right)}
The deviance information criterion is calculated as
DIC=pD+\overline{D(\theta)},
or equivalently as
DIC=D(\bar{\theta})+2pD.
From this latter form, the connection with AIC is more evident.
The idea is that models with smaller DIC should be preferred to models with larger DIC. Models are penalized both by the value of
\bar{D}
pD
\barD
pD
An advantage of DIC over other criteria in the case of Bayesian model selection is that the DIC is easily calculated from the samples generated by a Markov chain Monte Carlo simulation. AIC requires calculating the likelihood at its maximum over
\theta
\bar{D}
D(\theta)
\theta
D(\bar{\theta})
D
\theta
In the derivation of DIC, it is assumed that the specified parametric family of probability distributions that generate future observations encompasses the true model. This assumption does not always hold, and it is desirable to consider model assessment procedures in that scenario.
Also, the observed data are used both to construct the posterior distribution and to evaluate the estimated models.Therefore, DIC tends to select over-fitted models.
A resolution to the issues above was suggested by, with the proposal of the Bayesian predictive information criterion (BPIC). Ando (2010, Ch. 8) provided a discussion of various Bayesian model selection criteria. To avoid the over-fitting problems of DIC, developed Bayesian model selection criteria from a predictive view point. The criterion is calculated as
IC
\theta[ | |
=\bar{D}+2p | |
D=-2E |
log(p(y|\theta))]+2pD.
The first term is a measure of how well the model fits the data, while the second term is a penalty on the model complexity. Note that the in this expression is the predictive distribution rather than the likelihood above.