Richardson–Lucy deconvolution explained

The Richardson–Lucy algorithm, also known as Lucy–Richardson deconvolution, is an iterative procedure for recovering an underlying image that has been blurred by a known point spread function. It was named after William Richardson and Leon B. Lucy, who described it independently.^[1] ^[2]

Description

When an image is produced using an optical system and detected using photographic film or a charge-coupled device, for example, it is inevitably blurred, with an ideal point source not appearing as a point but being spread out into what is known as the point spread function. Extended sources can be decomposed into the sum of many individual point sources, thus the observed image can be represented in terms of a transition matrix p operating on an underlying image:

d_i=\sum_jp_i,ju_j

where

u_j

is the intensity of the underlying image at pixel

and

d_i

is the detected intensity at pixel

. In general, a matrix whose elements are

p_i,j

describes the portion of light from source pixel j that is detected in pixel i. In most good optical systems (or in general, linear systems that are described as shift invariant) the transfer function p can be expressed simply in terms of the spatial offset between the source pixel j and the observation pixel i:

p_i,j=P(i-j)

where

P(\Deltai)

is called a point spread function. In that case the above equation becomes a convolution. This has been written for one spatial dimension, but most imaging systems are two dimensional, with the source, detected image, and point spread function all having two indices. So a two dimensional detected image is a convolution of the underlying image with a two dimensional point spread function

P(\Deltax,\Deltay)

plus added detection noise.

In order to estimate

u_j

given the observed

d_i

and a known

P(\Deltai_x,\Deltaj_y)

, the following iterative procedure is employed in which the estimate of

u_j

(called

	(t)
\hat{u}
	j

) for iteration number t is updated as follows:

	(t+1)
\hat{u}
	j

	(t)
\hat{u}
	j

\sum_i

	d_i
	c_i

p_ij

where

c_i=\sum_jp_ij

	(t)
\hat{u}
	j

and

\sum_jp_ij=1

is assumed. It has been shown empirically that if this iteration converges, it converges to the maximum likelihood solution for

u_j

Writing this more generally for two (or more) dimensions in terms of convolution with a point spread function P:

\hat{u}^(t+1)=\hat{u}^(t) ⋅ \left(

	d
	\hat{u

^(t) ⊗ P} ⊗ P^*\right)

where the division and multiplication are element wise,

⊗

indicates a 2D convolution, and

P^*

is the mirrored point spread function, or the inverse Fourier transform of the Hermitian transpose of the optical transfer function.

In problems where the point spread function

p_ij

is not known a priori, a modification of the Richardson–Lucy algorithm has been proposed, in order to accomplish blind deconvolution.

Derivation

In the context of fluorescence microscopy, the probability of measuring a set of number of photons (or digitalization counts proportional to detected light)

m=[m_0,...,m_K]

for expected values

E=[E_0,...,E_K]

for a detector with

K+1

pixels is given by

P(m\vertE)=

	K
\prod
	i

Poisson(E_i)=

	K
\prod
	i

{E_i

	m_i

	-E_i
e

}

it is convenient to work with

ln(P)

since in the context of maximum likelihood estimation the aim is to locate the maximum of the likelihood function without concern for its absolute value.

ln(P(m\vertE))=

	K
\sum
	i

\left[(m_ilnE_i-E_i)-ln(m_i!)\right]

Again since

ln(m_i!)

is a constant, it will not give any additional information regarding the position of the maximum, so consider

\alpha(m\vertE)=

	K
\sum
	i

\left[m_ilnE_i-E_i\right]

where

\alpha

is something that shares the same maximum position as

P(m\vertE)

. Now consider that

comes from a ground truth

and a measurement

which is assumed to be linear. Then

E=Hx

where a matrix multiplication is implied. This can also be written in the form

E_m=

	K
\sum
	n

H_mnx_n

where it can be seen how

, mixes or blurs the ground truth.

It can also be shown that the derivative of an element of

(E_i)

with respect to some other element of

can be written as:

Tip: it is easy to see this by writing a matrix

of say (5 x 5) and two arrays

and

of 5 elements and check it. This last equation can interpreted as how much one element of

, say element

influences the other elements

j ≠ i

(and of course the case

i=j

is also taken into account). For example in a typical case an element of the ground truth

will influence nearby elements in

but not the very distant ones (a value of

is expected on those matrix elements).

Now, the key and arbitrary step:

is not known but may be estimated by

\hat{x

}, let's call

\hat{x_old

} and

\hat{x_new

} the estimated ground truths while using the RL algorithm, where the hat symbol is used to distinguish ground truth from estimator of the ground truth

Where

	\partial
	\partialx

stands for a

-dimensional gradient. Performing the partial derivative of

\alpha(m\vertE(x))

yields the following expression

	\partial \alpha(m\vertE(x))
	\partialx_j

	\partial
	\partialx_j

	K
\sum
	i

\left[m_ilnE_i-E_i\right]=

	K
\sum		\left[
	i

	m_i
	E_i

	\partial
	\partialx_j

E_i-

	\partial
	\partialx_j

E_i\right]=

	K
\sum
	i

	\partialE_i	\left[
	\partialx_j

	m_i
	E_i

-1\right]

By substituting it follows that

	\partial \alpha(m\vertE(x))
	\partialx_j

	K
\sum
	i

H_ij\left[

	m_i
	E_i

-1\right]

Note that

	T
{H}
	ji

=H_ij

by the definition of a matrix transpose. And hence

Since this equation is true for all

spanning all the elements from

, these

equations may be compactly rewritten as a single vectorial equation

	\partial \alpha(m\vertE(x))
	\partialx

={H^T}\left[

	m
	E

-1\right]

where

H^T

is a matrix and

and

are vectors. Now as a seemingly arbitrary but key step, let

where

is a vector of ones of size

(same as

and

) and the division is element-wise. By using and, may be rewritten as

\hat{x

}_ = \hat_ + \lambda \frac = \hat_ + \frac \left[\frac{\mathbf{m}}{\mathbf{E}} - \mathbf{1} \right] = \hat_ + \frac \mathbf^T \frac - \hat_

which yields

Where division refers to element-wise matrix division and

H^T

operates as a matrix but the division and the product (implicit after

\hat{x

}_) are element-wise. Also,

E=E(\hat{x

}_) = \mathbf \hat_ can be calculated because since it is assumed that

- The initial guess

\hat{x

}_ is known (and is typically set to be the experimental data) - The measurement function

is known

On the other hand

is the experimental data. Therefore, equation applied successively, provides an algorithm to estimate our ground truth

x_new

by ascending (since it moves in the direction of the gradient of the likelihood) in the likelihood landscape. It has not been demonstrated in this derivation that it converges and no dependence on the initial choice is shown. Note that equation provides a way of following the direction that increases the likelihood but the choice of the log-derivative is arbitrary. On the other hand equation introduces a way of weighting the movement from the previous step in the iteration. Note that if this term was not present in then the algorithm would output a movement in the estimation even if

m=E(\hat{x

}_). It's worth noting that the only strategy used here is to maximize the likelihood at all cost, so artifacts on the image can be introduced. It is worth noting that no prior knowledge on the shape of the ground truth

is used in this derivation.

Software

RawTherapee (since v.2.3)

Notes and References

Richardson, William Hadley . Bayesian-Based Iterative Method of Image Restoration . 1972 . . 62 . 1 . 55–59 . 10.1364/JOSA.62.000055. 1972JOSA...62...55R .
Lucy, L. B. . 1974 . An iterative technique for the rectification of observed distributions . Astronomical Journal . 79 . 6 . 745–754 . 10.1086/111605. 1974AJ.....79..745L .

Richardson–Lucy deconvolution explained

Description

Derivation

Software

See also

Notes and References