Auxiliary particle filter explained

The auxiliary particle filter is a particle filtering algorithm introduced by Pitt and Shephard in 1999 to improve some deficiencies of the sequential importance resampling (SIR) algorithm when dealing with tailed observation densities.

Motivation

Particle filters approximate continuous random variable by

particles with discrete probability mass

\pi_t

, say

1/M

for uniform distribution. The random sampled particles can be used to approximate the probability density function of the continuous random variable if the value

M → infin

The empirical prediction density is produced as the weighted summation of these particles:^[1]

\widehat{f}(\alpha_t+1|Y_t)=\sum

	Mf(\alpha

	t+1

	j
\|\alpha
	t

, and we can view it as the "prior" density. Note that the particles are assumed to have the same weight

j=	1
	M

\pi

Combining the prior density

\widehat{f}(\alpha_t+1|Y_t)

and the likelihood

f(y_t+1|\alpha_t+1)

, the empirical filtering density can be produced as:

\widehat{f}(\alpha_t+1|Y_t+1)=

	f(y_t+1\|\alpha_t+1)\widehat{f
	(\alpha

_t+1|Y_t)}{f(y_t+1|Y_t)}\proptof(y_t+1|\alpha_t+1)

	Mf(\alpha
\sum
	t+1

	j
\|\alpha
	t

, where

f(y_t+1|Y_t)=\intf(y_t+1|\alpha_t+1)dF(\alpha_t+1|Y_t)

On the other hand, the true filtering density which we want to estimate is

f(\alpha_t+1|Y_t+1)=

	f(y_t+1\|\alpha_t+1)f(\alpha_t+1\|Y_t)
	f(y_t+1\|Y_t)

The prior density

\widehat{f}(\alpha_t+1|Y_t)

can be used to approximate the true filtering density

f(\alpha_t+1|Y_t+1)

The particle filters draw

samples from the prior density

\widehat{f}(\alpha_t+1|Y_t)

. Each sample are drawn with equal probability.

Assign each sample with the weights

\pi

\omega_j

	R\omega
\sum
	i

	j)
,\omega
	j=f(y\|\alpha

. The weights represent the likelihood function

f(y_t+1|\alpha_t+1)

If the number

R → infin

, than the samples converge to the desired true filtering density.

particles are resampled to

particles with the weight

\pi_j

The weakness of the particle filters includes:

If the weight has a large variance, the sample amount

must be large enough for the samples to approximate the empirical filtering density. In other words, while the weight is widely distributed, the SIR method will be imprecise and the adaption is difficult.

Therefore, the auxiliary particle filter is proposed to solve this problem.

Auxiliary particle filter

Auxiliary variable

Comparing with the empirical filtering density which has

\widehat{f}(\alpha_t+1|Y_t+1)\proptof(y_t+1|\alpha_t+1)

	Mf(\alpha
\sum
	t+1

	j
\|\alpha
	t

we now define

\widehat{f}(\alpha_t+1,k|Y_t+1)\proptof(y_t+1|\alpha_t+1)f(\alpha_t+1

	k)\pi
\|\alpha
	t

, where

k=1,...,M

Being aware that

\widehat{f}(\alpha_t+1|Y_t+1)

is formed by the summation of

particles, the auxiliary variable

represents one specific particle. With the aid of

, we can form a set of samples which has the distribution

g(\alpha_t+1,k|Y_t+1)

. Then, we draw from these sample set

g(\alpha_t+1,k|Y_t+1)

instead of directly from

\widehat{f}(\alpha_t+1|Y_t+1)

. In other words, the samples are drawn from

\widehat{f}(\alpha_t+1|Y_t+1)

with different probability. The samples are ultimately utilized to approximate

f(\alpha_t+1|Y_t+1)

Take the SIR method for example:

The particle filters draw

samples from

g(\alpha_t+1,k|Y_t+1)

Assign each samples with the weight

\pi

\omega_j

	R\omega
\sum
	i

\omega

f(y

	k)
\|\alpha
	t

t+1

j,k

g(\alpha

	j\|Y

	t+1

)

t+1

By controlling

y_t+1

and

	k
\alpha
	t

, the weights are adjusted to be even.

Similarly, the

particles are resampled to

particles with the weight

\pi_j

The original particle filters draw samples from the prior density, while the auxiliary filters draw from the joint distribution of the prior density and the likelihood. In other words, the auxiliary particle filters avoid the circumstance which the particles are generated in the regions with low likelihood. As a result, the samples can approximate

f(\alpha_t+1|Y_t+1)

more precisely.

Selection of the auxiliary variable

The selection of the auxiliary variable affects

g(\alpha_t+1,k|Y_t+1)

and controls the distribution of the samples. A possible selection of

g(\alpha_t+1,k|Y_t+1)

can be:

g(\alpha_t+1,k|Y_t+1)\proptof(y_t+1

	k)
\|\mu
	t+1

f(\alpha_t+1

	k)\pi
\|\alpha
	t

, where

k=1,...,M

and

	k
\mu
	t+1

is the mean.

We sample from

g(\alpha_t+1,k|Y_t+1)

to approximate

f(\alpha_t+1|Y_t+1)

by the following procedure:

First, we assign probabilities to the indexes of

f(\alpha_t+1

	k)
\|\alpha
	t

. We named these probabilities as the first-stage weights

λ_k

, which are proportional to

g(k|Y_t+1)\propto

	kf(y
\pi
	t+1

	k)
\|\mu
	t+1

Then, we draw

samples from

f(\alpha_t+1

	k)
\|\alpha
	t

with the weighted indexes. By doing so, we are actually drawing the samples from

g(\alpha_t+1,k|Y_t+1)

Moreover, we reassign the second-stage weights

\pi

\omega_j

	R\omega
\sum
	i

as the probabilities of the

samples, where

\omega

f(y

	j)
\|\alpha
	t+1

t+1

f(y

	j)
\|\mu
	t+1

t+1

. The weights are aim to compensate the effect of

	k
\mu
	t+1

Finally, the

particles are resampled to

particles with the weights

\pi_j

Following the procedure, we draw the

samples from

g(\alpha_t+1,k|Y_t+1)

. Since

g(\alpha_t+1,k|Y_t+1)

is closely related to the mean

	k
\mu
	t+1

, it has high conditional likelihood. As a result, the sampling procedure is more efficient and the value

can be reduced.

Other point of view

Assume that the filtered posterior is described by the following M weighted samples:

p(x_t|z_1:t) ≈

	M
\sum
	i=1

	(i)
\omega
	t

\delta\left(x_t-

	(i)
x
	t

\right).

Then, each step in the algorithm consists of first drawing a sample of the particle index

which will be propagated from

t-1

into the new step

. These indexes are auxiliary variables only used as an intermediary step, hence the name of the algorithm. The indexes are drawn according to the likelihood of some reference point

	(i)
\mu
	t

which in some way is related to the transition model

x_t|x_t-1

(for example, the mean, a sample, etc.):

k⁽ⁱ⁾\simP(i=k|z_t)\propto

	(i)
\omega
	t

p(z_t|

	(i)
\mu
	t

)

This is repeated for

i=1,2,...,M

, and using these indexes we can now draw the conditional samples:

	(i)
x
	t

\simp(x|

	k⁽ⁱ⁾
x
	t-1

Finally, the weights are updated to account for the mismatch between the likelihood at the actual sample and the predicted point

	k⁽ⁱ⁾
\mu
	t

	(i)
\omega
	t

\propto

z_t|

	(i)
x
	t)

p(z_t|

	k⁽ⁱ⁾
\mu
	t)

Sources

Pitt, M.K. . Shephard, N. . 1999 . Filtering Via Simulation: Auxiliary Particle Filters . Journal of the American Statistical Association . 94 . 446 . 590 - 591 . 2008-05-06 . 10.2307/2670179 . 2670179 . American Statistical Association . https://web.archive.org/web/20071016200646/http://www.questia.com/PM.qst?a=o . 2007-10-16 . dead.

Notes and References

Michael K. . Pitt. Neil. Shephard. Filtering Via Simulation: Auxiliary Particle Filters. Journal of the American Statistical Association.