In statistics, the observed information, or observed Fisher information, is the negative of the second derivative (the Hessian matrix) of the "log-likelihood" (the logarithm of the likelihood function). It is a sample-based version of the Fisher information.
Suppose we observe random variables
X1,\ldots,Xn
\theta
X1,\ldots,Xn
\ell(\theta|X1,\ldots,Xn)=
n | |
\sum | |
i=1 |
logf(Xi|\theta)
We define the observed information matrix at
\theta*
l{J}(\theta*)=-\left.\nabla\nabla\top\ell(\theta)
\right| | |
\theta=\theta* |
=- \left. \left(\begin{array}{cccc} \tfrac{\partial2}{\partial
2} | |
\theta | |
1 |
&\tfrac{\partial2}{\partial\theta1\partial\theta2} & … &\tfrac{\partial2}{\partial\theta1\partial\thetap}\\ \tfrac{\partial2}{\partial\theta2\partial\theta1} &\tfrac{\partial2}{\partial
2} | |
\theta | |
2 |
& … &\tfrac{\partial2}{\partial\theta2\partial\thetap}\\ \vdots& \vdots& \ddots& \vdots\\ \tfrac{\partial2}{\partial\thetap\partial\theta1} &\tfrac{\partial2}{\partial\thetap\partial\theta2} & … &\tfrac{\partial2}{\partial
2} | |
\theta | |
p |
\\ \end{array}\right)
\ell(\theta) \right| | |
\theta=\theta* |
Andrew Gelman, David Dunson and Donald Rubin[2] define observed information instead in terms of the parameters' posterior probability,
p(\theta|y)
I(\theta)=-
d2 | |
d\theta2 |
logp(\theta|y)
l{I}(\theta)
X
\theta
l{I}(\theta)=E(l{J}(\theta))
The comparison between the observed information and the expected information remains an active and ongoing area of research and debate. Efron and Hinkley[3] provided a frequentist justification for preferring the observed information to the expected information when employing normal approximations to the distribution of the maximum-likelihood estimator in one-parameter families in the presence of an ancillary statistic that affects the precision of the MLE. Lindsay and Li showed that the observed information matrix gives the minimum mean squared error as an approximation of the true information if an error term of
O(n-3/2)
However, when the construction of confidence intervals is of primary focus, there are reported findings that the expected information outperforms the observed counterpart. Yuan and Spall showed that the expected information outperforms the observed counterpart for confidence-interval constructions of scalar parameters in the mean squared error sense.[5] This finding was later generalized to multiparameter cases, although the claim had been weakened to the expected information matrix performing at least as well as the observed information matrix.[6]