Monotone likelihood ratio explained

A monotonic likelihood ratio in distributions

f(x)

and

g(x)

The ratio of the density functions above is monotone in the parameter

x,

so
f(x)
g(x)

satisfies the monotone likelihood ratio property.

In statistics, the monotone likelihood ratio property is a property of the ratio of two probability density functions (PDFs). Formally, distributions

f(x)

and

g(x)

bear the property if

foreveryx2>x1,

f(x2)
g(x2)

\geq

f(x1)
g(x1)

that is, if the ratio is nondecreasing in the argument

x

.

If the functions are first-differentiable, the property may sometimes be stated

\partial
\partialx

\left(

f(x)
g(x)

\right)\geq0 

For two distributions that satisfy the definition with respect to some argument

x,

we say they "have the MLRP in

x~.

" For a family of distributions that all satisfy the definition with respect to some statistic

T(X),

we say they "have the MLR in

T(X)~.

"

Intuition

The MLRP is used to represent a data-generating process that enjoys a straightforward relationship between the magnitude of some observed variable and the distribution it draws from. If

f(x)

satisfies the MLRP with respect to

g(x)

, the higher the observed value

x

, the more likely it was drawn from distribution

f

rather than

g~.

As usual for monotonic relationships, the likelihood ratio's monotonicity comes in handy in statistics, particularly when using maximum-likelihood estimation. Also, distribution families with MLR have a number of well-behaved stochastic properties, such as first-order stochastic dominance and increasing hazard ratios. Unfortunately, as is also usual, the strength of this assumption comes at the price of realism. Many processes in the world do not exhibit a monotonic correspondence between input and output.

Example: Working hard or slacking off

Suppose you are working on a project, and you can either work hard or slack off. Call your choice of effort

e

and the quality of the resulting project

q~.

If the MLRP holds for the distribution of

q

conditional on your effort

e

, the higher the quality the more likely you worked hard. Conversely, the lower the quality the more likely you slacked off.

1: Choose effort

e\in\{H,L\}

where

H

means high effort, and

L

means low effort.

2: Observe

q

drawn from

f(q|e)~.

By Bayes' law with a uniform prior,

\operatorname{P}l[e=H|qr]=

f(q|H)
f(q|H)+f(q|L)

3: Suppose

f(q|e)

satisfies the MLRP. Rearranging, the probability the worker worked hard is
1
 1+f(q|L)/f(q|H)

which, thanks to the MLRP, is monotonically increasing in

q

(because
f(q|L)
f(q|H)

is decreasing in

q

).

Hence if some employer is doing a "performance review" he can infer his employee's behavior from the merits of his work.

Families of distributions satisfying MLR

Statistical models often assume that data are generated by a distribution from some family of distributions and seek to determine that distribution. This task is simplified if the family has the monotone likelihood ratio property (MLRP).

A family of density functions

l\{f\theta(x)|\theta\in\Thetar\}

indexed by a parameter

\theta

taking values in an ordered set

\Theta

is said to have a monotone likelihood ratio (MLR) in the statistic

T(X)

if for any

\theta1<\theta2 ,

f
\theta2
(X=x1,x2,x3,\ldots)
f(X=x1,x2,x3,\ldots)
\theta1

is a non-decreasing function of

T(X)~.

Then we say the family of distributions "has MLR in

T(X)

".

List of families

Family  

T(X)

in which

f\theta(X)

has the MLR   
  Exponential

[λ]

  
  

\sumxi

observations
  Binomial

[n,p]

  
  

\sumxi

observations
  Poisson

[λ]

  
  

\sumxi

observations
  Normal

[\mu,\sigma]

 
   if

\sigma

known,

\sumxi

observations

Hypothesis testing

If the family of random variables has the MLRP in

T(X),

a uniformly most powerful test can easily be determined for the hypothesis

H0 :\theta\le\theta0 

versus

H1 :\theta>\theta0~.

Example: Effort and output

Example: Let

e

be an input into a stochastic technology – worker's effort, for instance – and

y

its output, the likelihood of which is described by a probability density function

f(y;e)~.

Then the monotone likelihood ratio property (MLRP) of the family

f

is expressed as follows: For any

e1,e2 ,

the fact that

e2>e1

implies that the ratio
f(y;e2)
f(y;e1)

is increasing in

y~.

Relation to other statistical properties

Monotone likelihoods are used in several areas of statistical theory, including point estimation and hypothesis testing, as well as in probability models.

Exponential families

One-parameter exponential families have monotone likelihood-functions. In particular, the one-dimensional exponential family of probability density functions or probability mass functions with

f\theta(x)=c(\theta)h(x)\expl(\pi\left(\theta\right)T\left(x\right)r)

has a monotone non-decreasing likelihood ratio in the sufficient statistic

T(x),

provided that

\pi(\theta)

is non-decreasing.

Uniformly most powerful tests: The Karlin–Rubin theorem

Monotone likelihood functions are used to construct uniformly most powerful tests, according to the Karlin–Rubin theorem.[1] Consider a scalar measurement having a probability density function parameterized by a scalar parameter

\theta,

and define the likelihood ratio

\ell(x)=

f
\theta1
(x)
f(x)
\theta0

~.

If

\ell(x)

is monotone non-decreasing, in

x,

for any pair

\theta1\geq\theta0 

(meaning that the greater

x

is, the more likely

H1 

is), then the threshold test:

\varphi(x)=\begin{cases} 1&ifx>x0\\ 0&ifx<x0 \end{cases}

where

x0 

is chosen so that

\operatorname{E}l\{\varphi(X)|\theta0 r\}=\alpha

is the UMP test of size

\alpha

for testing

H0 :\theta\leq\theta0~~

vs.

~~H1:\theta>\theta0~.

Note that exactly the same test is also UMP for testing

H0 :\theta=\theta0~~

vs.

~~H1:\theta>\theta0~.

Median unbiased estimation

Monotone likelihood-functions are used to construct median-unbiased estimators, using methods specified by Johann Pfanzagl and others.[2] [3] One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss functions.[3]

Lifetime analysis: Survival analysis and reliability

If a family of distributions

f\theta(x)

has the monotone likelihood ratio property in

T(X),

  1. the family has monotone decreasing hazard rates in

\theta

(but not necessarily in

T(X)

)
  1. the family exhibits the first-order (and hence second-order) stochastic dominance in

x,

and the best Bayesian update of

\theta

is increasing in

T(X)

.

But not conversely: neither monotone hazard rates nor stochastic dominance imply the MLRP.

Proofs

Let distribution family

f\theta

satisfy MLR in

x,

so that for

\theta1>\theta0 

and

x1>x0 :

f
\theta1
(x1)
f(x1)
\theta0

\geq

f(x0)
\theta1
f
\theta0
(x0)

,

or equivalently:

f
\theta1

(x1)f

\theta0

(x0)\geq

f
\theta1

(x0)f

\theta0

(x1)~.

Integrating this expression twice, we obtain:

1. To

x1 

with respect to

x0 

\begin{align} &

x1
\int
minX
f
\theta1

(x1)f

\theta0

(x0)dx0\\[6pt] \geq{}&

x1
\int
minX
f
\theta1

(x0)f

\theta0

(x1)dx0 \end{align}

integrate and rearrange to obtain

f
\theta1
(x)
f(x)
\theta0

\geq

F
\theta1
(x)
F(x)
\theta0

2. From

x0

with respect to

x1 

\begin{align} &

maxX
\int
x0
f
\theta1

(x1)f

\theta0

(x0)dx1\\[6pt] \geq{}&

maxX
\int
x0
f
\theta1

(x0)f

\theta0

(x1)dx1 \end{align}

integrate and rearrange to obtain

1-
F
\theta1
(x)
 1-
F
\theta0
(x)

\geq

f
\theta1
(x)
f(x)
\theta0

First-order stochastic dominance

Combine the two inequalities above to get first-order dominance:

F
\theta1

(x)\leq

F
\theta0

(x)~\forallx

Monotone hazard rate

Use only the second inequality above to get a monotone hazard rate:

f
\theta1
(x)
 1-
F
\theta1
(x)

\leq

f
\theta0
(x)
 1-
F
\theta0
(x)

~\forallx

Uses

Economics

The MLR is an important condition on the type distribution of agents in mechanism design and economics of information, where Paul Milgrom defined "favorableness" of signals (in terms of stochastic dominance) as a consequence of MLR.[4] Most solutions to mechanism design models assume type distributions that satisfy the MLR to take advantage of solution methods that may be easier to apply and interpret.

Notes and References

  1. Book: Casella . G. . Berger . R.L. . 2008 . Statistical Inference . Brooks / Cole . 0-495-39187-5 . Theorem 8.3.17.
  2. Pfanzagl . Johann . 1979 . On optimal median unbiased estimators in the presence of nuisance parameters . . 7 . 1 . 187–193 . 10.1214/aos/1176344563 . free.
  3. Brown . L.D. . Lawrence D. Brown . Cohen . Arthur . Strawderman . W.E. . 1976 . A complete class theorem for strict monotone likelihood ratio with applications . . 4 . 4 . 712–722 . 10.1214/aos/1176343543 . free.
  4. Milgrom . P.R. . Paul Milgrom . 1981 . Good news and bad news: Representation theorems and applications . . 12 . 2 . 380–391 . 10.2307/3003562.