Pseudolikelihood Explained

In statistical theory, a pseudolikelihood is an approximation to the joint probability distribution of a collection of random variables. The practical use of this is that it can provide an approximation to the likelihood function of a set of observed data which may either provide a computationally simpler problem for estimation, or may provide a way of obtaining explicit estimates of model parameters.

The pseudolikelihood approach was introduced by Julian Besag in the context of analysing data having spatial dependence.

Definition

Given a set of random variables

X=X_1,X_2,\ldots,X_n

the pseudolikelihood of

X=x=(x_1,x_2,\ldots,x_n)

L(\theta):=\prod_iPr_\theta(X_i=x_i\midX_j=x_jforj ≠ i)=\prod_iPr_\theta(X_i=x_i\midX_-i=x_-i)

in discrete case and

L(\theta):=\prod_ip_\theta(x_i\midx_jforj ≠ i)=\prod_ip_\theta(x_i\midx_-i)=\prod_ip_\theta(x_i\midx_1,\ldots,\hatx_i,\ldots,x_n)

in continuous one. Here

is a vector of variables,

is a vector of values,

p_{\theta( ⋅}\mid ⋅ )

is conditional density and

\theta=(\theta_1,\ldots,\theta_p)

is the vector of parameters we are to estimate. The expression

X=x

above means that each variable

X_i

in the vector

has a corresponding value

x_i

in the vector

and

x_-i=(x_1,\ldots,\hatx_i,\ldots,x_n)

means that the coordinate

x_i

has been omitted. The expression

Pr_\theta(X=x)

is the probability that the vector of variables

has values equal to the vector

. This probability of course depends on the unknown parameter

\theta

. Because situations can often be described using state variables ranging over a set of possible values, the expression

Pr_\theta(X=x)

can therefore represent the probability of a certain state among all possible states allowed by the state variables.

The pseudo-log-likelihood is a similar measure derived from the above expression, namely (in discrete case)

l(\theta):=logL(\theta)=\sum_ilogPr_\theta(X_i=x_i\midX_j=x_jforj ≠ i).

One use of the pseudolikelihood measure is as an approximation for inference about a Markov or Bayesian network, as the pseudolikelihood of an assignment to

X_i

may often be computed more efficiently than the likelihood, particularly when the latter may require marginalization over a large number of variables.

Properties

Use of the pseudolikelihood in place of the true likelihood function in a maximum likelihood analysis can lead to good estimates, but a straightforward application of the usual likelihood techniques to derive information about estimation uncertainty, or for significance testing, would in general be incorrect.^[1]

Notes and References

Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, Oxford University Press.