In computer vision, the Kanade–Lucas–Tomasi (KLT) feature tracker is an approach to feature extraction. It is proposed mainly for the purpose of dealing with the problem that traditional image registration techniques are generally costly. KLT makes use of spatial intensity information to direct the search for the position that yields the best match. It is faster than traditional techniques for examining far fewer potential matches between the images.
The traditional image registration problem can be characterized as follows: Given two functions
F(x)
G(x)
x
x
h
F(x+h)
G(x)
x
R
Some measures of the difference between
F(x+h)
G(x)
The KLT feature tracker is based on two papers:
In the first paper, Lucas and Kanade[1] developed the idea of a local search using gradients weighted by an approximation to the second derivative of the image.
If
h
F(x)
G(x)=F(x+h)
so that
This approximation to the gradient of the image is only accurate if the displacement of the local area between the two images to be registered is not too large. The approximation to
h
x
h
x
The average can be further improved by weighting the contribution of each term to it, which is inversely proportional to an estimate of
\left\vertF''(x)\right\vert
For the purpose of facilitating the expression, a weighting function is defined:
The average with weighting is thereby:
Upon obtaining the estimate
F(x)
h
h
The derivation above cannot be generalized well to two dimensions for the 2-D linear approximation occurs differently. This can be corrected by applying the linear approximation in the form:
to find the
h
To minimize the error with respect to
h
E
This is basically the same as the 1-D case, except for the fact that the weighting function
w(x)=F'(x)2.
To evaluate the performance of the algorithm, we are naturally curious about under what conditions and how fast the sequence of
hk
h
Consider the case:
Both versions of the registration algorithm will converge to the correct
h
\left\verth\right\vert<\pi
Since lowpass-filtered images can be sampled at lower resolution with no loss of information, a coarse-to-fine strategy is adopted. A low-resolution smoothed version of the image can be used to obtain an approximate match. Applying the algorithm to higher resolution images will refine the match obtained at lower resolution.
As smoothing extends the range of convergence, the weighting function improves the accuracy of approximation, speeding up the convergence. Without weighting, the calculated displacement
h1
F(x)=\sinx
The implementation requires the calculation of the weighted sums of the quantities
F'G,
F'F,
(F')2
R.
F'(x)
where
\Deltax
Some sophisticated technique can be used for estimating the first derivatives, but in general such techniques are equivalent to first smoothing the function, and then taking the difference.
The registration algorithm for 1-D and 2-D can be generalized to more dimensions. To do so, we try to minimize the L2 norm measure of error:
where
x
h
A linear approximation analogous:
And partially differentiate
E
h
which has much the same form as the 1-D version.
The method can also be extended to take into account registration based on more complex transformations, such as rotation, scaling, and shearing, by considering
where
A
To determine the amount
\DeltaA
A
\Deltah
h
The approximation can be used similarly to find the error expression, which becomes quadratic in the quantities to be minimized with respect to. After figuring out the error expression, differentiate it with respect to the quantities to be minimized and set the results zero, yielding a set of linear equations, then solve them.
A further generalization is designed for accounting for the fact that the brightness may be different in the two views, due to the difference of the viewpoints of the cameras or to differences in the processing of the two images. Assume the difference as linear transformation:
where
\alpha
\beta
Combining this expression with the general linear transformation registration problem:
as the quantity to minimize with respect to
\alpha,
\beta,
A,
h.
In the second paper Tomasi and Kanade[2] used the same basic method for finding the registration due to the translation but improved the technique by tracking features that are suitable for the tracking algorithm. The proposed features would be selected if both the eigenvalues of the gradient matrix were larger than some threshold.
By a very similar derivation, the problem is formulated as
where
\nabla
λ1
λ2
\nabla
A tracking method based on these two papers is generally considered a KLT tracker.
In a third paper, Shi and Tomasi[3] proposed an additional stage of verifying that features were tracked correctly.
An affine transformation is fit between the image of the currently tracked feature and its image from a non-consecutive previous frame. If the affine compensated image is too dissimilar the feature is dropped.
The reasoning is that between consecutive frames a translation is a sufficient model for tracking but due to more complex motion, perspective effects, etc. a more complex model is required when frames are further apart.
Using a similar derivation as for the KLT, Shi and Tomasi showed that the search can be performed using the formula
where
T
z
a
\nablad=e