The condensation algorithm (Conditional Density Propagation) is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object recognition. Being able to identify which pixels in an image make up the contour of an object is a non-trivial problem. Condensation is a probabilistic algorithm that attempts to solve this problem.
The algorithm itself is described in detail by Isard and Blake in a publication in the International Journal of Computer Vision in 1998.[1] One of the most interesting facets of the algorithm is that it does not compute on every pixel of the image. Rather, pixels to process are chosen at random, and only a subset of the pixels end up being processed. Multiple hypotheses about what is moving are supported naturally by the probabilistic nature of the approach. The evaluation functions come largely from previous work in the area and include many standard statistical approaches. The original part of this work is the application of particle filter estimation techniques.
The algorithm’s creation was inspired by the inability of Kalman filtering to perform object tracking well in the presence of significant background clutter. The presence of clutter tends to produce probability distributions for the object state which are multi-modal and therefore poorly modeled by the Kalman filter. The condensation algorithm in its most general form requires no assumptions about the probability distributions of the object or measurements.
xt |
t
z1,...,zt |
p(xt|z1,...,zt) |
p(xt|z1,...,zt) |
The conditional density of the object at the current time
p(xt|z1,...,zt) |
(n) | |
\{s | |
t |
,n=1,...,N\}
(n) | |
\pi | |
t |
p(xt|z1,...,zt) |
st
\pit
The assumptions that object dynamics form a temporal Markov chain and that observations are independent of each other and the dynamics facilitate the implementation of the condensation algorithm. The first assumption allows the dynamics of the object to be entirely determined by the conditional density
p(xt|xt-1) |
p(xt|xt-1) |
The algorithm can be summarized by initialization at time
t=0
Form the initial sample set and weights by sampling according to the prior distribution. For example, specify as Gaussian and set the weights equal to each other.
N
(n) | |
\{s | |
0 |
,n=1,...,N\}
(n) | |
\{\pi | |
0 |
,n=1,...,N\}
p(xt|z1,...,zt) |
p(xt|xt-1) |
(n) | |
\{s | |
t |
\}
zt |
(n) | |
\pi | |
t |
=
| |||||||||||||
|
(n) | |
\{s | |
t |
\}
This algorithm outputs the probability distribution
p(xt|z1,...,zt) |
Cumulative weights can instead be used to achieve a more efficient sampling.
Since object-tracking can be a real-time objective, consideration of algorithm efficiency becomes important. The condensation algorithm is relatively simple when compared to the computational intensity of the Ricatti equation required for Kalman filtering. The parameter
N
One way to increase efficiency of the algorithm is by selecting a low degree of freedom model for representing the shape of the object. The model used by Isard 1998 is a linear parameterization of B-splines in which the splines are limited to certain configurations. Suitable configurations were found by analytically determining combinations of contours from multiple views, of the object in different poses, and through principal component analysis (PCA) on the deforming object.
Isard and Blake model the object dynamics
p(xt|xt-1) |
p(xt|xt-1) |
\propto
| ||||||
e |
)-A(xt-1-\bar{x |
where
\bar{x
A
B
A
B
\bar{x
The observation model
p(z|x)
λ
\sigma
The basic condensation algorithm is used to track a single object in time. It is possible to extend the condensation algorithm using a single probability distribution to describe the likely states of multiple objects to track multiple objects in a scene at the same time.[4]
Since clutter can cause the object probability distribution to split into multiple peaks, each peak represents a hypothesis about the object configuration. Smoothing is a statistical technique of conditioning the distribution based on both past and future measurements once the tracking is complete in order to reduce the effects of multiple peaks.[5] Smoothing cannot be directly done in real-time since it requires information of future measurements.
The algorithm can be used for vision-based robot localization of mobile robots.[6] Instead of tracking the position of an object in the scene, however, the position of the camera platform is tracked. This allows the camera platform to be globally localized given a visual map of the environment.
Extensions of the condensation algorithm have also been used to recognize human gestures in image sequences. This application of the condensation algorithm impacts the range of human–computer interactions possible. It has been used to recognize simple gestures of a user at a whiteboard to control actions such as selecting regions of the boards to print or save them.[7] Other extensions have also been used for tracking multiple cars in the same scene.[8]
The condensation algorithm has also been used for face recognition in a video sequence.[9]
An implementation of the condensation algorithm in C can be found on Michael Isard’s website.
An implementation in MATLAB can be found on the Mathworks File Exchange.
An example of implementation using the OpenCV library can be found on the OpenCV forums.