Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference - in particular, to James–Stein estimation and empirical Bayes methods - and its applications to portfolio choice theory.[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.
Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence. This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.
Suppose X is a normally distributed random variable with expectation μ and variance σ2. Further suppose g is a differentiable function for which the two expectations E(g(X) (X - μ)) and E(g ′(X)) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then
El(g(X)(X-\mu)r)=\sigma2El(g'(X)r).
In general, suppose X and Y are jointly normally distributed. Then
\operatorname{Cov}(g(X),Y)=\operatorname{Cov}(X,Y)E(g'(X)).
For a general multivariate Gaussian random vector
(X1,...,Xn)\simN(\mu,\Sigma)
El(g(X)(X-\mu)r)=\Sigma ⋅ El(\nablag(X)r).
The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is
\varphi(x)={1\over
-x2/2 | |
\sqrt{2\pi}}e |
Since
\intx\exp(-x2/2)dx=-\exp(-x2/2)
E[g(X)X] =
1 | |
\sqrt{2\pi |
The case of general variance
\sigma2
Isserlis' theorem is equivalently stated aswhere
(X1,...Xn)
Suppose X is in an exponential family, that is, X has the density
fη(x)=\exp(η'T(x)-\Psi(η))h(x).
Suppose this density has support
(a,b)
a,b
-infty,infty
x → aorb
\exp(η'T(x))h(x)g(x) → 0
g
E|g'(X)|<infty
\exp(η'T(x))h(x) → 0
a,b
E\left[\left( | h'(X) |
h(X) |
+\sumηiTi'(X)\right) ⋅ g(X)\right]=-E[g'(X)].
The derivation is same as the special case, namely, integration by parts.
If we only know
X
R
E|g(X)|<inftyandE|g'(X)|<infty
\limx → fη(x)g(x)\not=0
g(x)=1
fη(x)
f(x)=\begin{cases}1&x\in[n,n+2-n)\ 0&otherwise\end{cases}
f
Extensions to elliptically-contoured distributions also exist.[4] [5] [6]