In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator." In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly. The technique is named after its discoverer, Charles Stein.[1]
Let
\mu\in{R}d
x\in{R}d
\mui,i=1,...,d,
\sigma2
h(x)
\mu
x
h(x)=x+g(x)
g
\operatorname{SURE}(h)=d\sigma2+\|g(x)\|2+2\sigma2
d | |
\sum | |
i=1 |
\partial | |
\partialxi |
gi(x)=-d\sigma2+\|g(x)\|2+2\sigma2
d | |
\sum | |
i=1 |
\partial | |
\partialxi |
hi(x),
gi(x)
i
g(x)
\| ⋅ \|
The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of
h(x)
\operatornameE\mu\{\operatorname{SURE}(h)\}=\operatorname{MSE}(h),
\operatorname{MSE}(h)=\operatornameE\mu\|h(x)-\mu\|2.
Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter
\mu
\mu
We wish to show that
\operatornameE\mu\|h(x)-\mu\|2=\operatornameE\mu\{\operatorname{SURE}(h)\}.
\begin{align}\operatornameE\mu\|h(x)-\mu\|2&=\operatornameE\mu\|g(x)+x-\mu\|2\\ &=\operatornameE\mu\|g(x)\|2+\operatornameE\mu\|x-\mu\|2+2\operatornameE\mug(x)T(x-\mu)\\ &=\operatornameE\mu\|g(x)\|2+d\sigma2+2\operatornameE\mug(x)T(x-\mu). \end{align}
\begin{align} \operatornameE\mug(x)T(x-\mu)&=
d} | |
\int | |
{R |
1 | |
\sqrt{2\pi\sigma2d |
\operatornameE\mu\|h(x)-\mu\|2=\operatornameE\mu\left(d\sigma2+\|g(x)\|2+2\sigma2
d | |
\sum | |
i=1 |
dgi | |
dxi |
\right).
A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator. The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.[3]