Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to find approximate solutions for filtering problems for nonlinear state-space systems, such as signal processing and Bayesian statistical inference.[1] The filtering problem consists of estimating the internal states in dynamical systems when partial observations are made and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the posterior distributions of the states of a Markov process, given the noisy and partial observations. The term "particle filters" was first coined in 1996 by Pierre Del Moral about mean-field interacting particle methods used in fluid mechanics since the beginning of the 1960s.[2] The term "Sequential Monte Carlo" was coined by Jun S. Liu and Rong Chen in 1998.[3]
Particle filtering uses a set of particles (also called samples) to represent the posterior distribution of a stochastic process given the noisy and/or partial observations. The state-space model can be nonlinear and the initial state and noise distributions can take any form required. Particle filter techniques provide a well-established methodology[4] [5] for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to very high-dimensional systems.
Particle filters update their prediction in an approximate (statistical) manner. The samples from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms. However, it can be mitigated by including a resampling step before the weights become uneven. Several adaptive resampling criteria can be used including the variance of the weights and the relative entropy concerning the uniform distribution.[6] In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.
From the statistical and probabilistic point of view, particle filters may be interpreted as mean-field particle interpretations of Feynman-Kac probability measures.[7] [8] [9] [10] These particle integration techniques were developed in molecular chemistry and computational physics by Theodore E. Harris and Herman Kahn in 1951, Marshall N. Rosenbluth and Arianna W. Rosenbluth in 1955,[11] and more recently by Jack H. Hetherington in 1984. In computational physics, these Feynman-Kac type path particle integration methods are also used in Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods.[12] [13] [14] Feynman-Kac interacting particle methods are also strongly related to mutation-selection genetic algorithms currently used in evolutionary computation to solve complex optimization problems.
The particle filter methodology is used to solve Hidden Markov Model (HMM) and nonlinear filtering problems. With the notable exception of linear-Gaussian signal-observation models (Kalman filter) or wider classes of models (Benes filter[15]), Mireille Chaleyat-Maurel and Dominique Michel proved in 1984 that the sequence of posterior distributions of the random states of a signal, given the observations (a.k.a. optimal filter), has no finite recursion.[16] Various other numerical methods based on fixed grid approximations, Markov Chain Monte Carlo techniques, conventional linearization, extended Kalman filters, or determining the best linear system (in the expected cost-error sense) are unable to cope with large-scale systems, unstable processes, or insufficiently smooth nonlinearities.
Particle filters and Feynman-Kac particle methodologies find application in signal and image processing, Bayesian inference, machine learning, risk analysis and rare event sampling, engineering and robotics, artificial intelligence, bioinformatics,[17] phylogenetics, computational science, economics and mathematical finance, molecular chemistry, computational physics, pharmacokinetics, quantitative risk and insurance[18] [19] and other fields.
From a statistical and probabilistic viewpoint, particle filters belong to the class of branching/genetic type algorithms, and mean-field type interacting particle methodologies. The interpretation of these particle methods depends on the scientific discipline. In Evolutionary Computing, mean-field genetic type particle methodologies are often used as heuristic and natural search algorithms (a.k.a. Metaheuristic). In computational physics and molecular chemistry, they are used to solve Feynman-Kac path integration problems or to compute Boltzmann-Gibbs measures, top eigenvalues, and ground states of Schrödinger operators. In Biology and Genetics, they represent the evolution of a population of individuals or genes in some environment.
The origins of mean-field type evolutionary computational techniques can be traced back to 1950 and 1954 with Alan Turing's work on genetic type mutation-selection learning machines[20] and the articles by Nils Aall Barricelli at the Institute for Advanced Study in Princeton, New Jersey.[21] [22] The first trace of particle filters in statistical methodology dates back to the mid-1950s; the 'Poor Man's Monte Carlo',[23] that was proposed by Hammersley et al., in 1954, contained hints of the genetic type particle filtering methods used today. In 1963, Nils Aall Barricelli simulated a genetic type algorithm to mimic the ability of individuals to play a simple game.[24] In evolutionary computing literature, genetic-type mutation-selection algorithms became popular through the seminal work of John Holland in the early 1970s, particularly his book[25] published in 1975.
In Biology and Genetics, the Australian geneticist Alex Fraser also published in 1957 a series of papers on the genetic type simulation of artificial selection of organisms.[26] The computer simulation of the evolution by biologists became more common in the early 1960s, and the methods were described in books by Fraser and Burnell (1970)[27] and Crosby (1973).[28] Fraser's simulations included all of the essential elements of modern mutation-selection genetic particle algorithms.
From the mathematical viewpoint, the conditional distribution of the random states of a signal given some partial and noisy observations is described by a Feynman-Kac probability on the random trajectories of the signal weighted by a sequence of likelihood potential functions. Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods can also be interpreted as a mean-field genetic type particle approximation of Feynman-Kac path integrals.[29] [30] [31] The origins of Quantum Monte Carlo methods are often attributed to Enrico Fermi and Robert Richtmyer who developed in 1948 a mean-field particle interpretation of neutron-chain reactions,[32] but the first heuristic-like and genetic type particle algorithm (a.k.a. Resampled or Reconfiguration Monte Carlo methods) for estimating ground state energies of quantum systems (in reduced matrix models) is due to Jack H. Hetherington in 1984. One can also quote the earlier seminal works of Theodore E. Harris and Herman Kahn in particle physics, published in 1951, using mean-field but heuristic-like genetic methods for estimating particle transmission energies.[33] In molecular chemistry, the use of genetic heuristic-like particle methodologies (a.k.a. pruning and enrichment strategies) can be traced back to 1955 with the seminal work of Marshall N. Rosenbluth and Arianna W. Rosenbluth.
The use of genetic particle algorithms in advanced signal processing and Bayesian inference is more recent. In January 1993, Genshiro Kitagawa developed a "Monte Carlo filter",[34] a slightly modified version of this article appeared in 1996.[35] In April 1993, Gordon et al., published in their seminal work[36] an application of genetic type algorithm in Bayesian statistical inference. The authors named their algorithm 'the bootstrap filter', and demonstrated that compared to other filtering methods, their bootstrap algorithm does not require any assumption about that state space or the noise of the system. Independently, the ones by Pierre Del Moral and Himilcon Carvalho, Pierre Del Moral, André Monin, and Gérard Salut[37] on particle filters published in the mid-1990s. Particle filters were also developed in signal processing in early 1989-1992 by P. Del Moral, J.C. Noyer, G. Rigal, and G. Salut in the LAAS-CNRS in a series of restricted and classified research reports with STCAN (Service Technique des Constructions et Armes Navales), the IT company DIGILOG, and the LAAS-CNRS (the Laboratory for Analysis and Architecture of Systems) on RADAR/SONAR and GPS signal processing problems.[38] [39] [40] [41] [42] [43]
From 1950 to 1996, all the publications on particle filters, and genetic algorithms, including the pruning and resample Monte Carlo methods introduced in computational physics and molecular chemistry, present natural and heuristic-like algorithms applied to different situations without a single proof of their consistency, nor a discussion on the bias of the estimates and genealogical and ancestral tree-based algorithms.
The mathematical foundations and the first rigorous analysis of these particle algorithms are due to Pierre Del Moral in 1996. The article also contains proof of the unbiased properties of a particle approximation of likelihood functions and unnormalized conditional probability measures. The unbiased particle estimator of the likelihood functions presented in this article is used today in Bayesian statistical inference.
Dan Crisan, Jessica Gaines, and Terry Lyons,[44] [45] [46] as well as Pierre Del Moral, and Terry Lyons,[47] created branching-type particle techniques with various population sizes around the end of the 1990s. P. Del Moral, A. Guionnet, and L. Miclo made more advances in this subject in 2000. Pierre Del Moral and Alice Guionnet[48] proved the first central limit theorems in 1999, and Pierre Del Moral and Laurent Miclo proved them in 2000. The first uniform convergence results concerning the time parameter for particle filters were developed at the end of the 1990s by Pierre Del Moral and Alice Guionnet.[49] [50] The first rigorous analysis of genealogical tree-ased particle filter smoothers is due to P. Del Moral and L. Miclo in 2001[51]
The theory on Feynman-Kac particle methodologies and related particle filter algorithms was developed in 2000 and 2004 in the books.[8] These abstract probabilistic models encapsulate genetic type algorithms, particle, and bootstrap filters, interacting Kalman filters (a.k.a. Rao–Blackwellized particle filter[52]), importance sampling and resampling style particle filter techniques, including genealogical tree-based and particle backward methodologies for solving filtering and smoothing problems. Other classes of particle filtering methodologies include genealogical tree-based models,[53] [54] backward Markov particle models,[55] adaptive mean-field particle models, island-type particle models,[56] [57] particle Markov chain Monte Carlo methodologies,[58] [59] Sequential Monte Carlo samplers [60] [61] [62] and Sequential Monte Carlo Approximate Bayesian Computation methods[63] and Sequential Monte Carlo ABC based Bayesian Bootstrap.[64]
A particle filter's goal is to estimate the posterior density of state variables given observation variables. The particle filter is intended for use with a hidden Markov Model, in which the system includes both hidden and observable variables. The observable variables (observation process) are linked to the hidden variables (state-process) via a known functional form. Similarly, the probabilistic description of the dynamical system defining the evolution of the state variables is known.
A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process. With respect to a state-space such as the one below:
\begin{array}{cccccccccc} X0&\to&X1&\to&X2&\to&X3&\to& … &signal\\ \downarrow&&\downarrow&&\downarrow&&\downarrow&& … &\\ Y0&&Y1&&Y2&&Y3&& … &observation \end{array}
Xk
Y0, … ,Yk,
All Bayesian estimates of
Xk
p(xk|y0,y1,...,yk)
p(x0,x1,...,xk|y0,y1,...,yk)
Particle methods often assume
Xk
Yk
X0,X1, …
dx | |
R |
dx\geqslant1
p(xk|xk-1)
Xk|Xk-1=xk\simp(xk|xk-1)
with an initial probability density
p(x0)
Y0,Y1, …
dy | |
R |
dy\geqslant1
X0,X1, …
Yk
Xk
Yk
Xk=xk
Yk|Xk=yk\simp(yk|xk)
Xk=g(Xk-1)+Wk-1
Yk=h(Xk)+Vk
where both
Wk
Vk
Wk
Vk
The assumption that the initial distribution and the transitions of the Markov chain are continuous for the Lebesgue measure can be relaxed. To design a particle filter we simply need to assume that we can sample the transitions
Xk-1\toXk
Xk,
xk\mapstop(yk|xk)
Xk
See main article: Approximate Bayesian computation. In certain problems, the conditional distribution of observations, given the random states of the signal, may fail to have a density; the latter may be impossible or too complex to compute.[17] In this situation, an additional level of approximation is necessitated. One strategy is to replace the signal
Xk
lXk=\left(Xk,Yk\right)
lYk=Yk+\epsilonlVk forsomeparameter \epsilon\in[0,1]
for some sequence of independent random variables
lVk
Law\left(Xk|lY0=y0, … ,lYk=yk\right) ≈ \epsilon\downarrowLaw\left(Xk|Y0=y0, … ,Yk=yk\right)
The particle filter associated with the Markov process
lXk=\left(Xk,Yk\right)
lY0=y0, … ,lYk=yk,
dx+dy | |
R |
p(lYk|lXk)
Bayes' rule for conditional probability gives:
p(x0, … ,xk|y0, … ,yk)=
p(y0, … ,yk|x0, … ,xk)p(x0, … ,xk) | |
p(y0, … ,yk) |
where
\begin{align} p(y0, … ,yk)&=\intp(y0, … ,yk|x0, … ,xk)p(x0, … ,xk)dx0 … dxk\\ p(y0, … ,yk|x0, … ,xk)
k | |
&=\prod | |
l=0 |
p(yl|xl)\\ p(x0, … ,xk)&=p0(x0)\prod
k | |
l=1 |
p(xl|xl-1) \end{align}
Particle filters are also an approximation, but with enough particles they can be much more accurate. The nonlinear filtering equation is given by the recursion
with the convention
p(x0|y0, … ,yk-1)=p(x0)
See main article: Feynman–Kac formula. We fix a time horizon n and a sequence of observations
Y0=y0, … ,Yn=yn
Gk(xk)=p(yk|xk).
In this notation, for any bounded function F on the set of trajectories of
Xk
\begin{align} \intF(x0, … ,xn)p(x0, … ,xn|y0, … ,yn)dx0 … dxn&=
| |||||||||
p(x |
0, … ,xn)dx0 … dxn}{\int
n | |
\left\{\prod\limits | |
k=0 |
p(yk|xk)\right\}p(x0, … ,xn)dx0 …
dx | |||||||||||||||||||||||||
|
\end{align}
Feynman-Kac path integration models arise in a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences. Their interpretations are dependent on the application domain. For instance, if we choose the indicator function
Gn(xn)=1A(xn)
E\left(F(X0, … ,Xn)|X0\inA, … ,Xn\inA\right)=
| |||||||||||||
|
P\left(X0\inA, … ,Xn\in
n | |
A\right)=E\left(\prod\limits | |
k=0 |
Gk(Xk)\right)
as soon as the normalizing constant is strictly positive.
Initially, such an algorithm starts with N independent random variables
i | |
\left(\xi | |
0\right) |
1\leqslant
p(x0)
i | |
\xi | |
k |
\right)1\leqslant\stackrel{selection
mimic/approximate the updating-prediction transitions of the optimal filter evolution :
i | |
\widehat{\xi} | |
k |
\right)1\leqslant
N | |
\sum | |
i=1 |
| |||||||
|
\delta | |||||||
|
(dxk)
where
\deltaa
i | |
\widehat{\xi} | |
k |
i | |
\widehat{\xi} | |
k |
i | |
\longrightarrow\xi | |
k+1 |
\simp(xk+1
i | |
|\widehat{\xi} | |
k), |
i=1, … ,N.
i | |
p(y | |
k) |
xk\mapstop(yk|xk)
i | |
x | |
k |
p(xk+1
i | |
|\widehat{\xi} | |
k) |
p(xk+1|xk)
i | |
x | |
k |
At each time k, we have the particle approximations
\widehat{p}(dxk|y0, … ,y
|
N | |
\sum | |
i=1 |
i | |
\delta | |
k} |
(dxk) ≈ N\uparrowinftyp(dxk|y0, … ,yk) ≈ N\uparrowinfty
N | |
\sum | |
i=1 |
| |||||||||||||||
|
\delta | |||||||
|
(dxk)
and
\widehat{p}(dxk|y0, … ,yk-1):=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk) ≈ N\uparrowinftyp(dxk|y0, … ,yk-1)
In Genetic algorithms and Evolutionary computing community, the mutation-selection Markov chain described above is often called the genetic algorithm with proportional selection. Several branching variants, including with random population sizes have also been proposed in the articles.
Particle methods, like all sampling-based approaches (e.g., Markov Chain Monte Carlo), generate a set of samples that approximate the filtering density
p(xk|y0, … ,yk).
For example, we may have N samples from the approximate posterior distribution of
Xk
1, | |
\widehat{\xi} | |
k |
… ,
N | |
\widehat{\xi} | |
k |
.
Then, expectations with respect to the filtering distribution are approximated by
with
\widehat{p}(dxk|y0, … ,y
|
N | |
\sum | |
i=1 |
i | |
\delta | |
k}(dx |
k)
where
\deltaa
p(dxk|y0, … ,yk):=p(xk|y0, … ,yk)dxk ≈ N\uparrowinfty\widehat{p}(dxk|y0, … ,y
|
N | |
\sum | |
i=1 |
i | |
\delta | |
k}(dx |
k)
Particle filters can be interpreted as a genetic type particle algorithm evolving with mutation and selection transitions. We can keep track of the ancestral lines
i | |
\left(\widehat{\xi} | |
0,k |
,
i | |
\widehat{\xi} | |
1,k |
i | |
, … ,\widehat{\xi} | |
k-1,k |
i | |
,\widehat{\xi} | |
k,k |
\right)
of the particles
i=1, … ,N
i | |
\widehat{\xi} | |
l,k |
i | |
\widehat{\xi} | |
k,k |
i | |
=\widehat{\xi} | |
k |
with the empirical measure
\widehat{p}(d(x0, … ,xk)|y0, … ,y
|
N | |
\sum | |
i=1 |
i | |
\delta | |
0,k |
i | |
,\widehat{\xi} | |
1,k |
i | |
, … ,\widehat{\xi} | |
k,k |
\right)}(d(x0, … ,xk))
Here F stands for any founded function on the path space of the signal. In a more synthetic form is equivalent to
\begin{align} p(d(x0, … ,xk)|y0, … ,yk)&:=p(x0, … ,xk|y0, … ,yk)dx0 … dxk\\ & ≈ N\uparrowinfty\widehat{p}(d(x0, … ,xk)|y0, … ,yk)\\ &:=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
\delta | |
0,k |
,
i | |
… ,\widehat{\xi} | |
k,k |
\right)}(d(x0, … ,xk)) \end{align}
Particle filters can be interpreted in many different ways. From the probabilistic point of view they coincide with a mean-field particle interpretation of the nonlinear filtering equation. The updating-prediction transitions of the optimal filter evolution can also be interpreted as the classical genetic type selection-mutation transitions of individuals. The sequential importance resampling technique provides another interpretation of the filtering transitions coupling importance sampling with the bootstrap resampling step. Last, but not least, particle filters can be seen as an acceptance-rejection methodology equipped with a recycling mechanism.
The nonlinear filtering evolution can be interpreted as a dynamical system in the set of probability measures of the form
ηn+1=\Phin+1\left(ηn\right)
\Phin+1
ηn(dxn)=p(xn|y0, … ,yn-1)dxn
satisfies a nonlinear evolution starting with the probability distribution
η0(dx0)=p(x0)dx0
i | |
\left(\xi | |
0\right) |
1\leqslant
η0(dx0)=p(x0)dx0
i | |
\left(\xi | |
n\right) |
1\leqslant
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxn) ≈ N\uparrowinftyηn(dxn)
At the next step we sample N (conditionally) independent random variables
\xin+1
i | |
:=\left(\xi | |
n+1 |
\right)1\leqslant
\Phin+1\left(
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
\right) ≈ N\uparrowinfty\Phin+1\left(ηn\right)=ηn+1
We illustrate this mean-field particle principle in the context of the evolution of the one step optimal predictors
For k = 0 we use the convention
p(x0|y0, … ,y-1):=p(x0)
By the law of large numbers, we have
\widehat{p}(dx | ||||
|
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dx0) ≈ N\uparrowinftyp(x0)dx0
in the sense that
\intf(x0)\widehat{p}(dx
|
N | |
\sum | |
i=1 |
i | |
f(\xi | |
0) ≈ |
N\uparrowinfty\intf(x0)p(dx0)dx0
for any bounded function
f
i | |
\left(\xi | |
k\right) |
1\leqslant
\widehat{p}(dxk|y0, … ,yk-1):=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk) ≈ N\uparrowinfty~p(xk~|~y0, … ,yk-1)dxk
in the sense that for any bounded function
f
\intf(xk)\widehat{p}(dxk|y0, … ,yk-1)=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
f(\xi | |
k) ≈ |
N\uparrowinfty\intf(xk)p(dxk|y0, … ,yk-1)dxk
In this situation, replacing by the empirical measure in the evolution equation of the one-step optimal filter stated in we find that
p(xk+1|y0, … ,yk) ≈ N\uparrowinfty\intp(xk+1|x'k)
p(yk|xk')\widehat{p | |
(dx' |
k|y0, … ,yk-1)}{\intp(yk|x''k)\widehat{p}(dx''k|y0, … ,yk-1)}
Notice that the right hand side in the above formula is a weighted probability mixture
\intp(xk+1|x'k)
p(yk|xk')\widehat{p | |
(dx' |
k|y0, … ,yk-1)}{\intp(yk|x''k)\widehat{p}(dx''k|y0, … ,yk-1
N | |
)}=\sum | |
i=1 |
| |||||||||||||||
|
p(xk+1
i | |
|\xi | |
k)=:\widehat{q}(x |
k+1|y0, … ,yk)
where
i | |
p(y | |
k) |
p(yk|xk)
i | |
x | |
k |
p(xk+1
i | |
|\xi | |
k) |
p(xk+1|xk)
i | |
x | |
k |
i=1, … ,N.
Then, we sample N independent random variable
i | |
\left(\xi | |
k+1 |
\right)1\leqslant
\widehat{q}(xk+1|y0, … ,yk)
\widehat{p}(dxk+1|y0, … ,yk):=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk+1) ≈ N\uparrowinfty\widehat{q}(xk+1|y0, … ,yk)dxk+1 ≈ N\uparrowinftyp(xk+1|y0, … ,yk)dxk+1
Iterating this procedure, we design a Markov chain such that
\widehat{p}(dxk|y0, … ,yk-1):=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk) ≈ N\uparrowinftyp(dxk|y0, … ,yk-1):=p(xk|y0, … ,yk-1)dxk
Notice that the optimal filter is approximated at each time step k using the Bayes' formulae
p(dxk|y0, … ,yk) ≈ N\uparrowinfty
p(yk|xk)\widehat{p | |
(dx |
k|y0, … ,yk-1)}{\intp(yk|x'k)\widehat{p}(dx'k|y0, … ,yk-1
N | |
)}=\sum | |
i=1 |
| |||||||
|
~\delta | |||||||
|
(dxk)
The terminology "mean-field approximation" comes from the fact that we replace at each time step the probability measure
p(dxk|y0, … ,yk-1)
\widehat{p}(dxk|y0, … ,yk-1)
The analysis of the convergence of particle filters was started in 1996 and in 2000 in the book and the series of articles.[68] [69] More recent developments can be found in the books, When the filtering equation is stable (in the sense that it corrects any erroneous initial condition), the bias and the variance of the particle particle estimates
Ik(f):=\intf(xk)p(dxk|y0, … ,yk-1) ≈ N\uparrowinfty\widehat{I}k(f):=\intf(xk)\widehat{p}(dxk|y0, … ,yk-1)
are controlled by the non asymptotic uniform estimates
\supk\geqslant\left\vertE\left(\widehat{I}k(f)\right)-Ik(f)\right\vert\leqslant
c1 | |
N |
\supk\geqslantE\left(\left[\widehat{I}k(f)-I
2\right)\leqslant | |
k(f)\right] |
c2 | |
N |
for any function f bounded by 1, and for some finite constants
c1,c2.
x\geqslant0
P\left(\left|\widehat{I}k(f)-Ik(f)\right|\leqslantc1
x | |
N |
+c2\sqrt{
x | |
N |
for some finite constants
c1,c2
Tracing back in time the ancestral lines
i | |
\left(\widehat{\xi} | |
0,k |
i | |
,\widehat{\xi} | |
1,k |
i | |
, … ,\widehat{\xi} | |
k-1,k |
i | |
,\widehat{\xi} | |
k,k |
\right),
i | |
\left(\xi | |
0,k |
i | |
,\xi | |
1,k |
i | |
, … ,\xi | |
k-1,k |
i | |
,\xi | |
k,k |
\right)
of the individuals
i | |
\widehat{\xi} | |
k |
i | |
\left(=\widehat{\xi} | |
k,k |
\right)
i | |
\xi | |
k |
i | |
\left(={\xi} | |
k,k |
\right)
\begin{align} \widehat{p}(d(x0, … ,xk)|y0, … ,yk)&:=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
\delta | |
0,k |
i | |
, … ,\widehat{\xi} | |
0,k |
\right)}(d(x0, … ,xk))\\ & ≈ N\uparrowinftyp(d(x0, … ,xk)|y0, … ,yk)\\ & ≈ N\uparrowinfty
N | |
\sum | |
i=1 |
| ||||||||||
|
\delta | ||||||||||||||||
|
(d(x0, … ,xk))\\ & \\ \widehat{p}(d(x0, … ,xk)|y0, … ,yk-1)&:=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | ||||||||||||||||
|
(d(x0, … ,xk))\\ & ≈ N\uparrowinftyp(d(x0, … ,xk)|y0, … ,yk-1)\\ &:=p(x0, … ,xk|y0, … ,yk-1)dx0, … ,dxk \end{align}
These empirical approximations are equivalent to the particle integral approximations
\begin{align} \intF(x0, … ,xn)\widehat{p}(d(x0, … ,xk)|y0, … ,yk)&:=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
F\left(\widehat{\xi} | |
0,k |
i | |
, … ,\widehat{\xi} | |
0,k |
\right)\\ & ≈ N\uparrowinfty\intF(x0, … ,xn)p(d(x0, … ,xk)|y0, … ,yk)\\ & ≈ N\uparrowinfty
N | |
\sum | |
i=1 |
| |||||||||||||||
|
i | |
F\left(\xi | |
0,k |
,
i | |
… ,\xi | |
k,k |
\right)\\ & \\ \intF(x0, … ,xn)\widehat{p}(d(x0, … ,xk)|y0, … ,yk-1)&:=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
F\left(\xi | |
0,k |
i | |
, … ,\xi | |
k,k |
\right)\\ & ≈ N\uparrowinfty\intF(x0, … ,xn)p(d(x0, … ,xk)|y0, … ,yk-1) \end{align}
for any bounded function F on the random trajectories of the signal. As shown in the evolution of the genealogical tree coincides with a mean-field particle interpretation of the evolution equations associated with the posterior densities of the signal trajectories. For more details on these path space models, we refer to the books.
We use the product formula
p(y0, … ,yn)=\prod
n | |
k=0 |
p(yk|y0, … ,yk-1)
with
p(yk|y0, … ,yk-1)=\intp(yk|xk)p(dxk|y0, … ,yk-1)
and the conventions
p(y0|y0, … ,y-1)=p(y0)
p(x0|y0, … ,y-1)=p(x0),
p(xk|y0, … ,yk-1)dxk
\widehat{p}(dxk|y0, … ,yk-1):=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk) ≈ N\uparrowinftyp(dxk|y0, … ,yk-1)
in the above displayed formula, we design the following unbiased particle approximation of the likelihood function
p(y0, … ,yn) ≈ N\uparrowinfty\widehat{p}(y0, … ,yn)=\prod
n | |
k=0 |
\widehat{p}(yk|y0, … ,yk-1)
with
\widehat{p}(yk|y0, … ,yk-1)=\intp(yk|xk)\widehat{p}(dxk|y0, … ,yk-1)=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
p(y | |
k) |
where
i | |
p(y | |
k) |
p(yk|xk)
i | |
x | |
k |
Using Bayes' rule, we have the formula
p(x0, … ,xn|y0, … ,yn-1)=p(xn|y0, … ,yn-1)p(xn-1|xn,y0, … ,yn-1) … p(x1|x2,y0,y1)p(x0|x1,y0)
Notice that
\begin{align}p(xk-1|xk,(y0, … ,yk-1))&\proptop(xk|xk-1)p(xk-1|(y0, … ,yk-1))\\ p(xk-1|(y0, … ,yk-1)&\proptop(yk-1|xk-1)p(xk-1|(y0, … ,yk-2) \end{align}
This implies that
p(xk-1|xk,(y0, … ,yk-1))=
p(yk-1|xk-1)p(xk|xk-1)p(xk-1|y0, … ,yk-2) | |
\intp(yk-1|x'k-1)p(xk|x'k-1)p(x'k-1|y0, … ,yk-2)dx'k-1 |
Replacing the one-step optimal predictors
p(xk-1|(y0, … ,yk-2))dxk-1
\widehat{p}(dxk-1|(y0, … ,yk-2))=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxk-1)\left( ≈ N\uparrowinftyp(dxk-1|(y0, … ,yk-2)):={p}(xk-1|(y0, … ,yk-2))dxk-1\right)
we find that
\begin{align} p(dxk-1|xk,(y0, … ,yk-1))& ≈ N\uparrowinfty\widehat{p}(dxk-1|xk,(y0, … ,yk-1))\\ &:=
p(yk-1|xk-1)p(xk|xk-1)\widehat{p | |
(dx |
k-1|y0, … ,yk-2)}{\intp(yk-1|x'k-1)~p(xk|x'k-1)\widehat{p}(dx'k-1|y0, … ,yk-2)}\\ &=
N | |
\sum | |
i=1 |
| |||||||||||||||||||||
|
\delta | |||||||
|
(dxk-1) \end{align}
We conclude that
p(d(x0, … ,xn)|(y0, … ,yn-1)) ≈ N\uparrowinfty\widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))
with the backward particle approximation
\begin{align} \widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))=\widehat{p}(dxn|(y0, … ,yn-1))\widehat{p}(dxn-1|xn,(y0, … ,yn-1)) … \widehat{p}(dx1|x2,(y0,y1))\widehat{p}(dx0|x1,y0) \end{align}
The probability measure
\widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))
is the probability of the random paths of a Markov chain
\flat | |
\left(X | |
k,n |
\right)0\leqslant
i | |
\xi | |
k, |
i=1, … ,N.
\flat | |
X | |
n,n |
\widehat{p}(dxn|(y0, … ,yn-1))=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | |||||||
|
(dxn)
\flat | |
X | |
k,n |
i | |
=\xi | |
k |
i=1, … ,N
\flat | |
X | |
k-1,n |
\widehat{p}(dxk-1
i | |
|\xi | |
k |
,(y0, … ,yk-1))=
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
\sum | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
j=1 |
~\delta | |||||||
|
(dxk-1)
In the above displayed formula,
\widehat{p}(dxk-1
i | |
|\xi | |
k |
,(y0, … ,yk-1))
\widehat{p}(dxk-1|xk,(y0, … ,yk-1))
i | |
x | |
k |
p(yk-1
j | |
|\xi | |
k-1 |
)
j | |
p(\xi | |
k-1 |
)
p(yk-1|xk-1)
p(xk|xk-1)
i | |
x | |
k |
xk-1
j | |
=\xi | |
k-1 |
.
p((x0, … ,xn)|(y0, … ,yn-1))
fk
\begin{align} \intp(d(x0, … ,xn)&|(y0, … ,yn-1))fk(xk)\\ & ≈ N\uparrowinfty\int\widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))fk(xk)\\ &=\int\widehat{p}(dxn|(y0, … ,yn-1))\widehat{p}(dxn-1|xn,(y0, … ,yn-1)) … \widehat{p}(dxk|xk+1,(y0, … ,yk))fk(xk)\\ &=\underbrace{\left[\tfrac{1}{N}, … ,\tfrac{1}{N}\right]}NMn-1 … Mk\begin{bmatrix}
1 | |
f | |
k)\\ \vdots\ f |
N | |
k) |
\end{bmatrix} \end{align}
where
Mk=(Mk(i,j))1\leqslant:
M | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This also shows that if
\overline{F}(x0, … ,x
|
n | |
\sum | |
k=0 |
fk(xk)
then
\begin{align}\int\overline{F}(x0, … ,xn)p(d(x0, … ,xn)|(y0, … ,yn-1))& ≈ N\uparrowinfty\int\overline{F}(x0, … ,xn)\widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))\\ &=
1 | |
n+1 |
n | |
\sum | |
k=0 |
\underbrace{\left[\tfrac{1}{N}, … ,\tfrac{1}{N}\right]}NMn-1Mn-2 … Mk\begin{bmatrix}
1 | |
f | |
k)\ \vdots\ f |
N | |
k) |
\end{bmatrix} \end{align}
We shall assume that filtering equation is stable, in the sense that it corrects any erroneous initial condition.
In this situation, the particle approximations of the likelihood functions are unbiased and the relative variance is controlled by
E\left(\widehat{p}(y0, … ,yn)\right)=p(y0, … ,yn), E\left(\left[
\widehat{p | |
(y |
0, … ,yn)}{p(y0, … ,y
2\right)\leqslant | |
n)}-1\right] |
cn | |
N |
,
for some finite constant c. In addition, for any
x\geqslant0
P\left(\left\vert
1 | |
n |
log{\widehat{p}(y0, … ,y
|
log{p(y0, … ,yn)}\right\vert\leqslantc1
x | |
N |
+c2\sqrt{
x | |
N |
for some finite constants
c1,c2
The bias and the variance of the particle particle estimates based on the ancestral lines of the genealogical trees
path | |
\begin{align} I | |
k(F) |
&:=\intF(x0, … ,xk)p(d(x0, … ,xk)|y0, … ,yk-1)\\ & ≈ N\uparrowinfty
path | |
\widehat{I} | |
k(F) |
\\ &:=\intF(x0, … ,xk)\widehat{p}(d(x0, … ,xk)|y0, … ,yk-1)\\ &=
1 | |
N |
N | |
\sum | |
i=1 |
i | |
F\left(\xi | |
0,k |
i | |
, … ,\xi | |
k,k |
\right) \end{align}
are controlled by the non asymptotic uniform estimates
\left|
path | |
E\left(\widehat{I} | |
k(F)\right)-I |
path | |
k |
(F)\right|\leqslant
c1k | |
N |
,
path | |
E\left(\left[\widehat{I} | |
k(F)-I |
path | |
k |
(F)\right]2\right)\leqslant
c2k | |
N |
,
for any function F bounded by 1, and for some finite constants
c1,c2.
x\geqslant0
P\left(\left|
path | |
\widehat{I} | |
k(F)-I |
path | |
k |
(F)\right|\leqslantc1
kx | |
N |
+c2\sqrt{
kx | |
N |
for some finite constants
c1,c2
\overline{F}(x0, … ,x
|
\sum0\leqslantfk(xk)
with
path | |
I | |
n(\overline{F}) |
≈ N\uparrowinfty
\flat,path | |
I | |
n(\overline{F}):=\int |
\overline{F}(x0, … ,xn)\widehat{p}backward(d(x0, … ,xn)|(y0, … ,yn-1))
with functions
fk
\supn\geqslant{\left\vert
\flat,path | |
E\left(\widehat{I} | |
n(\overline{F})\right)-I |
path | |
n |
(\overline{F})\right\vert}\leqslant
c1 | |
N |
and
\flat,path | |
E\left(\left[\widehat{I} | |
n(F)-I |
path | |
n |
(F)\right]2\right)\leqslant
c2 | |
nN |
+
c3 | |
N2 |
for some finite constants
c1,c2,c3.
Sequential importance Resampling (SIR), Monte Carlo filtering (Kitagawa 1993[34]), bootstrap filtering algorithm (Gordon et al. 1993[36]) and single distribution resampling (Bejuri W.M.Y.B et al. 2017[70]), are also commonly applied filtering algorithms, which approximate the filtering probability density
p(xk|y0, … ,yk)
\left\{\left
(i) | |
(w | |
k |
\right) : i\in\{1, … ,N\}\right\}.
The importance weights
(i) | |
w | |
k |
N | |
\sum | |
i=1 |
(i) | |
w | |
k |
=1.
Sequential importance sampling (SIS) is a sequential (i.e., recursive) version of importance sampling. As in importance sampling, the expectation of a function f can be approximated as a weighted average
\intf(xk)p(xk|y0,...,yk)dxk ≈
N | |
\sum | |
i=1 |
(i) | |
w | |
k |
(i) | |
f(x | |
k |
).
For a finite set of samples, the algorithm performance is dependent on the choice of the proposal distribution
\pi(xk|x0:k-1,y0:k)
The "optimal" proposal distribution is given as the target distribution
\pi(xk|x0:k-1,y0:k)=p(xk|xk-1,yk)=
p(yk|xk) | |
\intp(yk|xk)p(xk|xk-1)dxk |
~p(xk|xk-1).
This particular choice of proposal transition has been proposed by P. Del Moral in 1996 and 1998.[4] When it is difficult to sample transitions according to the distribution
p(xk|xk-1,yk)
\begin{align}
p(yk|xk) | |
\intp(yk|xk)p(xk|xk-1)dxk |
p(xk|xk-1)dxk&\simeqN\uparrowinfty
p(yk|xk) | |
\intp(yk|xk)\widehat{p |
(dxk|xk-1)}\widehat{p}(dxk|xk-1)\\ &=
N | |
\sum | |
i=1 |
| |||||||||||||||
|
\delta | ||||||||||
|
(dxk) \end{align}
with the empirical approximation
\widehat{p}(dxk|xk-1)=
1 | |
N |
N | |
\sum | |
i=1 |
\delta | ||||||||||
|
(dxk)~\simeqN\uparrowinftyp(xk|xk-1)dxk
associated with N (or any other large number of samples) independent random samples
i | |
X | |
k(x |
k-1),i=1, … ,N
Xk
Xk-1=xk-1
\deltaa
However, the transition prior probability distribution is often used as importance function, since it is easier to draw particles (or samples) and perform subsequent importance weight calculations:
\pi(xk|x0:k-1,y0:k)=p(xk|xk-1).
Resampling is used to avoid the problem of the degeneracy of the algorithm, that is, avoiding the situation that all but one of the importance weights are close to zero. The performance of the algorithm can be also affected by proper choice of resampling method. The stratified sampling proposed by Kitagawa (1993[34]) is optimal in terms of variance.
A single step of sequential importance resampling is as follows:
1) For
i=1, … ,N
(i) | |
x | |
k |
\sim
(i) | |
\pi(x | |
0:k-1 |
,y0:k)
2) For
i=1, … ,N
(i) | |
\hat{w} | |
k |
=
(i) | |
w | |
k-1 |
| ||||||||||||||||
|
.
Note that when we use the transition prior probability distribution as the importance function,
(i) | |
\pi(x | |
k |
(i) | |
|x | |
0:k-1 |
,y0:k)=
(i) | |
p(x | |
k-1 |
),
this simplifies to the following :
(i) | |
\hat{w} | |
k |
=
(i) | |
w | |
k-1 |
(i) | |
p(y | |
k), |
3) For
i=1, … ,N
(i) | |
w | |
k |
=
\hat{w | |
(i) |
k}{\sum
N | |
j=1 |
(j) | |
\hat{w} | |
k} |
4) Compute an estimate of the effective number of particles as
\hat{N}eff=
1 | |||||||||||||||
|
This criterion reflects the variance of the weights. Other criteria can be found in the article,[6] including their rigorous analysis and central limit theorems.
5) If the effective number of particles is less than a given threshold
\hat{N}eff<Nthr
a) Draw N particles from the current particle set with probabilities proportional to their weights. Replace the current particle set with this new one.
b) For
i=1, … ,N
(i) | |
w | |
k |
=1/N.
The term "Sampling Importance Resampling" is also sometimes used when referring to SIR filters, but the term Importance Resampling is more accurate because the word "resampling" implies that the initial sampling has already been done.[71]
The "direct version" algorithm is rather simple (compared to other particle filtering algorithms) and it uses composition and rejection. To generate a single sample x at k from
p | |
xk|y1:k |
(x|y1:k)
1) Set n = 0 (This will count the number of particles generated so far)
2) Uniformly choose an index i from the range
\{1,...,N\}
3) Generate a test
\hat{x}
p(xk|xk-1)
xk-1
(i) | |
=x | |
k-1|k-1 |
4) Generate the probability of
\hat{y}
\hat{x}
p(yk|xk),~with~xk=\hat{x}
yk
5) Generate another uniform u from
[0,mk]
mk=
\sup | |
xk |
p(yk|xk)
6) Compare u and
p\left(\hat{y}\right)
6a) If u is larger then repeat from step 2
6b) If u is smaller then save
\hat{x}
(i) | |
x | |
k|k |
7) If n
The goal is to generate P "particles" at k using only the particles from
k-1
xk
xk-1
k-1
This can be more easily visualized if x is viewed as a two-dimensional array. One dimension is k and the other dimension is the particle number. For example,
x(k,i)
k
(i) | |
x | |
k |
xk
(i) | |
x | |
k-1 |
k-1
xk
xk-1
Particle filters and Feynman-Kac particle methodologies find application in several contexts, as an effective mean for tackling noisy observations or strong nonlinearities, such as:
in observer-based schemas a particle filter can forecast expected sensors output enabling fault isolation[74] [75] [76]
visual localization, tracking, feature recognition[81]