The unscented transform (UT) is a mathematical function used to estimate the result of applying a given nonlinear transformation to a probability distribution that is characterized only in terms of a finite set of statistics. The most common use of the unscented transform is in the nonlinear projection of mean and covariance estimates in the context of nonlinear extensions of the Kalman filter. Its creator Jeffrey Uhlmann explained that "unscented" was an arbitrary name that he adopted to avoid it being referred to as the “Uhlmann filter.”[1]
Many filtering and control methods represent estimates of the state of a system in the form of a mean vector and an associated error covariance matrix. As an example, the estimated 2-dimensional position of an object of interest might be represented by a mean position vector,
[x,y]
x
y
The mean and covariance representation only gives the first two moments of an underlying, but otherwise unknown, probability distribution. In the case of a moving object, the unknown probability distribution might represent the uncertainty of the object's position at a given time. The mean and covariance representation of uncertainty is mathematically convenient because any linear transformation
T
m
M
Tm
TMTT
Although the covariance matrix is often treated as being the expected squared error associated with the mean, in practice the matrix is maintained as an upper bound on the actual squared error. Specifically, a mean and covariance estimate
(m,M)
M
m
M
Returning to the example above, when the covariance is zero it is trivial to determine the location of the object after it moves according to an arbitrary nonlinear function
f(x,y)
f(x,y)
In 1994 Jeffrey Uhlmann noted that the EKF takes a nonlinear function and partial distribution information (in the form of a mean and covariance estimate) of the state of a system but applies an approximation to the known function rather than to the imprecisely-known probability distribution. He suggested that a better approach would be to use the exact nonlinear function applied to an approximating probability distribution. The motivation for this approach is given in his doctoral dissertation, where the term unscented transform was first defined:[2]
Consider the following intuition: With a fixed number of parameters it should be easier to approximate a given distribution than it is to approximate an arbitrary nonlinear function/transformation. Following this intuition, the goal is to find a parameterization that captures the mean and covariance information while at the same time permitting the direct propagation of the information through an arbitrary set of nonlinear equations. This can be accomplished by generating a discrete distribution having the same first and second (and possibly higher) moments, where each point in the discrete approximation can be directly transformed. The mean and covariance of the transformed ensemble can then be computed as the estimate of the nonlinear transformation of the original distribution. More generally, the application of a given nonlinear transformation to a discrete distribution of points, computed so as to capture a set of known statistics of an unknown distribution, is referred to as an unscented transformation.
In other words, the given mean and covariance information can be exactly encoded in a set of points, referred to as sigma points, which if treated as elements of a discrete probability distribution has mean and covariance equal to the given mean and covariance. This distribution can be propagated exactly by applying the nonlinear function to each point. The mean and covariance of the transformed set of points then represents the desired transformed estimate. The principal advantage of the approach is that the nonlinear function is fully exploited, as opposed to the EKF which replaces it with a linear one. Eliminating the need for linearization also provides advantages independent of any improvement in estimation quality. One immediate advantage is that the UT can be applied with any given function whereas linearization may not be possible for functions that are not differentiable. A practical advantage is that the UT can be easier to implement because it avoids the need to derive and implement a linearizing Jacobian matrix.
To compute the unscented transform, one first has to choose a set of sigma points. Since the seminal work of Uhlmann, many different sets of sigma points have been proposed in the literature. A thoroughgoing review of these variants can be found in the work of Menegaz et al.[3] In general,
n+1
n
A canonical set of sigma points is the symmetric set originally proposed by Uhlmann. Consider the vertices of an equilateral triangle centered on origin in two dimensions:
s1=
1 | |
\sqrt{2 |
It can be verified that the above set of points has mean
s=\left[0,0\right]T,
S=I
(x,X)
X
x
n
Uhlmann showed that it is possible to conveniently generate a symmetric set of
2n+1
\pm\sqrt{nX}
X
The unscented transform is defined for the application of a given function to any partial characterization of an otherwise unknown distribution, but its most common use is for the case in which only the mean and covariance is given. A common example is the conversion from one coordinate system to another, such as from a Cartesian coordinate frame to polar coordinates.[4]
Suppose a 2-dimensional mean and covariance estimate,
(m,M)
m=[12.3,7.6]T, M=\begin{bmatrix}1.44&0\\0&2.89\end{bmatrix}
and the transformation function to polar coordinates,
f(x,y) → [r,\theta]
r=\sqrt{x2+y2}, \theta=\arctan\left(
y | |
x |
\right)
Multiplying each of the canonical simplex sigma points (given above) by
| ||||
M |
=\begin{bmatrix}1.2&0\\0&1.7\end{bmatrix}
m
\begin{align} m1&=[0,2.40]+[12.3,7.6]=[12.3,10.0]\\ m2&=[-1.47,-1.20]+[12.3,7.6]=[10.8,6.40]\\ m3&=[1.47,-1.20]+[12.3,7.6]=[13.8,6.40] \end{align}
Applying the transformation function
f
\begin{align}
+} | |
{m | |
1 |
&=f(12.3,10.0)=[15.85,0.68]\\
+} | |
{m | |
2 |
&=f(10.8,6.40)=[12.58,0.53]\\
+} | |
{m | |
3 |
&=f(13.8,6.40)=[15.18,0.44] \end{align}
The mean of these three transformed points,
mUT=
1 | |
3 |
3 | |
\Sigma | |
i=1 |
+} | |
{m | |
i |
mUT=[14.539,0.551]
MUT=
1 | |
3 |
3 | |
\Sigma | |
i=1 |
+} | |
\left({m | |
i |
-mUT\right)2
MUT=\begin{bmatrix}2.00&0.0443\\0.0443&0.0104\end{bmatrix}
This can be compared to the linearized mean and covariance:
\begin{align} mlinear&=f(12.3,7.6)=[14.46,0.554]T\\ Mlinear&=\nablafM
T | |
\nabla | |
f |
=\begin{bmatrix}1.927&0.047\\0.047&0.011\end{bmatrix} \end{align}
The absolute difference between the UT and linearized estimates in this case is relatively small, but in filtering applications the cumulative effect of small errors can lead to unrecoverable divergence of the estimate. The effect of the errors are exacerbated when the covariance is underestimated because this causes the filter to be overconfident in the accuracy of the mean. In the above example it can be seen that the linearized covariance estimate is smaller than that of the UT estimate, suggesting that linearization has likely produced an underestimate of the actual error in its mean.
In this example there is no way to determine the absolute accuracy of the UT and linearized estimates without ground truth in the form of the actual probability distribution associated with the original estimate and the mean and covariance of that distribution after application of the nonlinear transformation (e.g., as determined analytically or through numerical integration). Such analyses have been performed for coordinate transformations under the assumption of Gaussianity for the underlying distributions, and the UT estimates tend to be significantly more accurate than those obtained from linearization.[6] [7]
Empirical analysis has shown that the use of the minimal simplex set of
n+1
2n
(m,M)
Returning to the example, the minimal symmetric set of sigma points can be obtained from the covariance matrix
M=\begin{bmatrix}1.44&0\\0&2.89\end{bmatrix}
m=[12.3,7.6]
(2M)1/2=\sqrt{2}*\begin{bmatrix}1.2&0\\0&1.7\end{bmatrix}=\begin{bmatrix}1.697&0\\0&2.404\end{bmatrix}
\begin{align} m1&=[12.3,7.6]+[1.697,0]=[13.997,7.6]\\ m2&=[12.3,7.6]-[1.697,0]=[10.603,7.6]\\ m3&=[12.3,7.6]+[0,2.404]=[12.3,10.004]\\ m4&=[12.3,7.6]-[0,2.404]=[12.3,5.196] \end{align}
This construction guarantees that the mean and covariance of the above four sigma points is
(m,M)
f
\begin{align}
+} | |
{m | |
1 |
&=[15.927,0.497]\\
+} | |
{m | |
2 |
&=[13.045,0.622]\\
+} | |
{m | |
3 |
&=[15.854,0.683]\\
+} | |
{m | |
4 |
&=[13.352,0.400] \end{align}
The mean of these four transformed sigma points,
mUT=
1 | |
4 |
4 | |
\Sigma | |
i=1 |
{m'}i
mUT=[14.545,0.550]
MUT=
1 | |
4 |
4 | |
\Sigma | |
i=1 |
+} | |
({m | |
i |
-mUT)2
MUT=\begin{bmatrix}1.823&0.043\\0.043&0.012\end{bmatrix}
Uhlmann noted that given only the mean and covariance of an otherwise unknown probability distribution, the transformation problem is ill-defined because there is an infinite number of possible underlying distributions with the same first two moments. Without any a priori information or assumptions about the characteristics of the underlying distribution, any choice of distribution used to compute the transformed mean and covariance is as reasonable as any other. In other words, there is no choice of distribution with a given mean and covariance that is superior to that provided by the set of sigma points, therefore the unscented transform is trivially optimal.
This general statement of optimality is of course useless for making any quantitative statements about the performance of the UT, e.g., compared to linearization; consequently he, Julier and others have performed analyses under various assumptions about the characteristics of the distribution and/or the form of the nonlinear transformation function. For example, if the function is differentiable, which is essential for linearization, these analyses validate the expected and empirically-corroborated superiority of the unscented transform.[6] [7]
The unscented transform can be used to develop a non-linear generalization of the Kalman filter, known as the Unscented Kalman Filter (UKF). This filter has largely replaced the EKF in many nonlinear filtering and control applications, including for underwater,[8] ground and air navigation,[9] and spacecraft.[10] The unscented transform has also been used as a computational framework for Riemann-Stieltjes optimal control.[11] This computational approach is known as unscented optimal control.[12] [13]
Uhlmann and Simon Julier published several papers showing that the use of the unscented transformation in a Kalman filter, which is referred to as the unscented Kalman filter (UKF), provides significant performance improvements over the EKF in a variety of applications.[14] Julier and Uhlmann published papers using a particular parameterized form of the unscented transform in the context of the UKF which used negative weights to capture assumed distribution information. That form of the UT is susceptible to a variety of numerical errors that the original formulations (the symmetric set originally proposed by Uhlmann) do not suffer. Julier has subsequently described parameterized forms which do not use negative weights and also are not subject to those issues.[15]