Inverse-variance weighting explained

In statistics, inverse-variance weighting is a method of aggregating two or more random variables to minimize the variance of the weighted average. Each random variable is weighted in inverse proportion to its variance (i.e., proportional to its precision).

Given a sequence of independent observations with variances, the inverse-variance weighted average is given by^[1]

\hat{y}=

\sum

y_i/

	2
\sigma
	i

\sum

	2
1/\sigma
	i

The inverse-variance weighted average has the least variance among all weighted averages, which can be calculated as

Var(\hat{y})=

\sum

	2
1/\sigma
	i

If the variances of the measurements are all equal, then the inverse-variance weighted average becomes the simple average.

Inverse-variance weighting is typically used in statistical meta-analysis or sensor fusion to combine the results from independent measurements.

Context

Suppose an experimenter wishes to measure the value of a quantity, say the acceleration due to gravity of Earth, whose true value happens to be

\mu

. A careful experimenter makes multiple measurements, which we denote with

random variables

X_1,X₂,...,X_n

. If they are all noisy but unbiased, i.e., the measuring device does not systematically overestimate or underestimate the true value and the errors are scattered symmetrically, then the expectation value

E[X_i]=\mu

\foralli

. The scatter in the measurement is then characterised by the variance of the random variables

Var(X_i):=

	2
\sigma
	i

, and if the measurements are performed under identical scenarios, then all the

\sigma_i

are the same, which we shall refer to by

\sigma

. Given the

measurements, a typical estimator for

\mu

, denoted as

\hat{\mu}

, is given by the simple average

\overline{X}=

	1
	n

\sum_iX_i

. Note that this empirical average is also a random variable, whose expectation value

E[\overline{X}]

\mu

but also has a scatter. If the individual measurements are uncorrelated, the square of the error in the estimate is given by

Var(\overline{X})=

	1
	n²

\sum_i

	2
\sigma
	i

=\left(

	\sigma
	\sqrt{n

}\right)^2. Hence, if all the

\sigma_i

are equal, then the error in the estimate decreases with increase in

1/\sqrt{n}

, thus making more observations preferred.

Instead of

repeated measurements with one instrument, if the experimenter makes

of the same quantity with

different instruments with varying quality of measurements, then there is no reason to expect the different

\sigma_i

to be the same. Some instruments could be noisier than others. In the example of measuring the acceleration due to gravity, the different "instruments" could be measuring

from a simple pendulum, from analysing a projectile motion etc. The simple average is no longer an optimal estimator, since the error in

\overline{X}

might actually exceed the error in the least noisy measurement if different measurements have very different errors. Instead of discarding the noisy measurements that increase the final error, the experimenter can combine all the measurements with appropriate weights so as to give more importance to the least noisy measurements and vice versa. Given the knowledge of

	2,
\sigma
	1

	2,
\sigma
	2

...,

	2
\sigma
	n

, an optimal estimator to measure

\mu

would be a weighted mean of the measurements

\hat{\mu}=

	\sum_iw_iX_i
	\sum_iw_i

, for the particular choice of the weights

w_i=

	2
1/\sigma
	i

. The variance of the estimator

Var(\hat{\mu})=

\sum_i

	2
w
	i

	2
\sigma
	i

\left(\sum_iw_i\right)²

, which for the optimal choice of the weights become

Var(\hat{\mu}_opt)=\left(\sum_i

	-2
\sigma
	i

\right)^-1.

Note that since

Var(\hat{\mu}_opt)<min_j

	2
\sigma
	j

, the estimator has a scatter smaller than the scatter in any individual measurement. Furthermore, the scatter in

\hat{\mu}_opt

decreases with adding more measurements, however noisier those measurements may be.

Derivation

Uncorrelated measurements

Consider a generic weighted sum

Y=\sum_iw_iX_i

, where the weights

w_i

are normalised such that

\sum_iw_i=1

. If the

X_i

are all independent, the variance of

is given by (see Bienaymé's identity)

Var(Y)=\sum_i

	2
w
	i

	2.
\sigma
	i

For optimality, we wish to minimise

Var(Y)

which can be done by equating the gradient with respect to the weights of

Var(Y)

to zero, while maintaining the constraint that

\sum_iw_i=1

. Using a Lagrange multiplier

w₀

to enforce the constraint, we express the variance:

Var(Y)=\sum_i

	2
w
	i

	2
\sigma
	i

-w_0(\sum_iw_i-1).

For

k>0

	\partial
	\partialw_k

Var(Y)=2w_k\sigma

	2

	k

-w_0,

which implies that:

w_k=

w_0/2

	2
\sigma
	k

The main takeaway here is that

w_k\propto

	2
1/\sigma
	k

. Since

\sum_iw_i=1

	2
	w₀

=\sum_i

	2
\sigma
	i

	2
\sigma
	0

The individual normalised weights are:

w_k=

	2
\sigma
	k

\left(\sum_i

	2
\sigma
	i

\right)^-1.

It is easy to see that this extremum solution corresponds to the minimum from the second partial derivative test by noting that the variance is a quadratic function of the weights. Thus, the minimum variance of the estimator is then given by:

Var(Y)=\sum_i

	4
\sigma
	0

	4
\sigma
	i

	2
\sigma
	i

	4\sum
\sigma
	i

	2
\sigma
	i

	2
\sigma
	0

\sigma

	2
\sigma
	0

\sum

	2
1/\sigma
	i

Normal distributions

For normally distributed random variables inverse-variance weighted averages can also be derived as the maximum likelihood estimate for the true value. Furthermore, from a Bayesian perspective the posterior distribution for the true value given normally distributed observations

y_i

and a flat prior is a normal distribution with the inverse-variance weighted average as a mean and variance

Var(Y)

Multivariate case

For potentially correlated multivariate distributions an equivalent argument leads to an optimal weighting based on the covariance matrices

C_i

of the individual vector-valued estimates

x_i

\hat{x

} = \left(\sum_i \mathbf_i^\right)^\sum_i \mathbf_i^ \mathbf_i

\hat{C

} = \left(\sum_i \mathbf_i^\right)^

For multivariate distributions the term "precision-weighted" average is more commonly used.

Notes and References

Book: Joachim Hartung . Guido Knapp . Bimal K. Sinha . Statistical meta-analysis with applications . registration . 2008 . . 978-0-470-29089-7.

Inverse-variance weighting explained

Context

Derivation

Uncorrelated measurements

Normal distributions

Multivariate case

See also

Notes and References