Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection.
A corner can be defined as the intersection of two edges. A corner can also be defined as a point for which there are two dominant and different edge directions in a local neighbourhood of the point.
An interest point is a point in an image which has a well-defined position and can be robustly detected. This means that an interest point can be a corner but it can also be, for example, an isolated point of local intensity maximum or minimum, line endings, or a point on a curve where the curvature is locally maximal.
In practice, most so-called corner detection methods detect interest points in general, and in fact, the term "corner" and "interest point" are used more or less interchangeably through the literature. As a consequence, if only corners are to be detected it is necessary to do a local analysis of detected interest points to determine which of these are real corners. Examples of edge detection that can be used with post-processing to detect corners are the Kirsch operator and the Frei-Chen masking set.[1]
"Corner", "interest point" and "feature" are used interchangeably in literature, confusing the issue. Specifically, there are several blob detectors that can be referred to as "interest point operators", but which are sometimes erroneously referred to as "corner detectors". Moreover, there exists a notion of ridge detection to capture the presence of elongated objects.
Corner detectors are not usually very robust and often require large redundancies introduced to prevent the effect of individual errors from dominating the recognition task.
One determination of the quality of a corner detector is its ability to detect the same corner in multiple similar images, under conditions of different lighting, translation, rotation and other transforms.
A simple approach to corner detection in images is using correlation, but this gets very computationally expensive and suboptimal. An alternative approach used frequently is based on a method proposed by Harris and Stephens (below), which in turn is an improvement of a method by Moravec.
This is one of the earliest corner detection algorithms and defines a corner to be a point with low self-similarity. The algorithm tests each pixel in the image to see if a corner is present, by considering how similar a patch centered on the pixel is to nearby, largely overlapping patches. The similarity is measured by taking the sum of squared differences (SSD) between the corresponding pixels of two patches. A lower number indicates more similarity.
If the pixel is in a region of uniform intensity, then the nearby patches will look similar. If the pixel is on an edge, then nearby patches in a direction perpendicular to the edge will look quite different, but nearby patches in a direction parallel to the edge will result in only a small change. If the pixel is on a feature with variation in all directions, then none of the nearby patches will look similar.
The corner strength is defined as the smallest SSD between the patch and its neighbours (horizontal, vertical and on the two diagonals). The reason is that if this number is high, then the variation along all shifts is either equal to it or larger than it, so capturing that all nearby patches look different.
If the corner strength number is computed for all locations, that it is locally maximal for one location indicates that a feature of interest is present in it.
As pointed out by Moravec, one of the main problems with this operator is that it is not isotropic: if an edge is present that is not in the direction of the neighbours (horizontal, vertical, or diagonal), then the smallest SSD will be large and the edge will be incorrectly chosen as an interest point.[2]
See also: Harris Corner Detector.
Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly, instead of using shifted patches. (This corner score is often referred to as autocorrelation, since the term is used in the paper in which this detector is described. However, the mathematics in the paper clearly indicate that the sum of squared differences is used.)
Without loss of generality, we will assume a grayscale 2-dimensional image is used. Let this image be given by
I
(u,v)
(x,y)
S
S(x,y)=\sumu\sumvw(u,v)\left(I(u+x,v+y)-I(u,v)\right)2
I(u+x,v+y)
Ix
Iy
I
I(u+x,v+y) ≈ I(u,v)+Ix(u,v)x+Iy(u,v)y
This produces the approximation
S(x,y) ≈ \sumu\sumvw(u,v)\left(Ix(u,v)x+Iy(u,v)y\right)2,
which can be written in matrix form:
S(x,y) ≈ \begin{bmatrix}x&y\end{bmatrix}A\begin{bmatrix}x\ y\end{bmatrix},
where A is the structure tensor,
A=\sumu\sumvw(u,v)
2 | |
\begin{bmatrix} I | |
x(u,v) |
&Ix(u,v)Iy(u,v)\\ Ix(u,v)Iy(u,v)&
2 | |
I | |
y(u,v) |
\end{bmatrix} = \begin{bmatrix} \langle
2 | |
I | |
x |
\rangle&\langleIxIy\rangle\\ \langleIxIy\rangle&\langle
2 | |
I | |
y |
\rangle \end{bmatrix}
In words, we find the covariance of the partial derivative of the image intensity
I
x
y
Angle brackets denote averaging (i.e. summation over
(u,v)
w(u,v)
A corner (or in general an interest point) is characterized by a large variation of
S
\begin{bmatrix}x&y\end{bmatrix}
A
A
λ1 ≈ 0
λ2 ≈ 0
(x,y)
λ1 ≈ 0
λ2
λ1
λ2
Harris and Stephens note that exact computation of the eigenvalues is computationally expensive, since it requires the computation of a square root, and instead suggest thefollowing function
Mc
\kappa
Mc=λ1λ2-\kappa\left(λ1+
2 = | |
λ | |
2\right) |
\det(A)-\kappa\operatorname{trace}2(A)
Therefore, the algorithm does not have to actually compute the eigenvalue decomposition of the matrix
A
A
The Shi–Tomasi corner detector directly computes
min(λ1,λ2)
The value of
\kappa
One can avoid setting the parameter
\kappa
Mc'
Mc'=2
\det(A) | |
\operatorname{trace |
(A)+\epsilon},
\epsilon
If
A
A-1
1 | ||||||||||||||
|
\begin{bmatrix} \langle
2 | |
I | |
y |
\rangle&-\langleIxIy\rangle\\ -\langleIxIy\rangle&\langle
2 | |
I | |
x |
\rangle \end{bmatrix}.
The sum of the eigenvalues of
A-1
Mc'
-1 | |
λ | |
1(A |
)+
-1 | |
λ | |
2(A |
)=
\operatorname{trace | |
(A)}{\det(A)} |
≈
2 | |
Mc' |
.
In some cases, one may wish to compute the location of a corner with subpixel accuracy. To achieve an approximate solution, the Förstner[3] algorithm solves for the point closest to all the tangent lines of the corner in a given window and is a least-square solution. The algorithm relies on the fact that for an ideal corner, tangent lines cross at a single point.
The equation of a tangent line
Tx'(x)
x'
Tx'(x)=\nablaI(x')\top(x-x')=0
where
\nablaI(x')=\begin{bmatrix}Ix&Iy\end{bmatrix}\top
I
x'
The point
x0
N
x0=\underset{x\inR2 x
The distance from
x0
Tx'
Solving for
x0
\begin{align} x0&=\underset{x\inR2 x
A\inR2 x ,bf{b}\inR2 x ,c\inR
\begin{align} A&=\int\nablaI(x')\nablaI(x')\topdx'\\ b&=\int\nablaI(x')\nablaI(x')\topx'dx'\\ c&=\intx'\top\nablaI(x')\nablaI(x')\topx'dx'\\ \end{align}
Minimizing this equation can be done by differentiating with respect to
x
2Ax-2b=0 ⇒ Ax=b
Note that
A\inR2 x
A
A
x0=A-1b
only exists where an actual corner exists in the window
N
A methodology for performing automatic scale selection for this corner localization method has been presented by Lindeberg by minimizing the normalized residual
\tilde{d}min=
c-bTA-1b | |
\operatorname{trace |
A}
over scales. Thereby, the method has the ability to automatically adapt the scale levels for computing the image gradients to the noise level in the image data, by choosing coarser scale levels for noisy image data and finer scale levels for near ideal corner-like structures.
Notes:
c
c=0
The computation of the second moment matrix (sometimes also referred to as the structure tensor)
A
Ix,Iy
With
I
L
I
g(x,y,t)=
1 | |
2{\pi |
t}
-\left(x2+y2\right)/2t | |
e |
t
L(x,y,t) =g(x,y,t)*I(x,y)
Lx=\partialxL
Ly=\partialyL
L
g(x,y,s)
s
\mu(x,y;t,s)
infty | |
= \int | |
\xi=-infty |
infty | |
\int | |
η=-infty |
2(x-\xi, | |
\begin{bmatrix} L | |
x |
y-η;t)&Lx(x-\xi,y-η;t)Ly(x-\xi,y-η;t)\\ Lx(x-\xi,y-η;t)Ly(x-\xi,y-η;t)&
2(x-\xi, | |
L | |
y |
y-η;t)\end{bmatrix} g(\xi,η;s)d\xidη.
\mu
A
Mc(x,y;t,s)=\det(\mu(x,y;t,s))-\kappa\operatorname{trace}2(\mu(x,y;t,s)).
t
s
\gamma
s=\gamma2t
\gamma
[1,2]
Mc(x,y;t,\gamma2t)
t
In practice, this multi-scale corner detector is often complemented by a scale selection step, where the scale-normalized Laplacian operator
2 | |
\nabla | |
norm |
L(x,y;t) =t\nabla2L(x,y,t)=t(Lxx(x,y,t)+Lyy(x,y,t))
Mc(x,y;t,\gamma2t)
(\hat{x},\hat{y};t)=\operatorname{argmaxlocal}(x,Mc\left(x,y;t,\gamma2t\right)
2 | |
\nabla | |
norm(x, |
y,t)
\hat{t}=\operatorname{argmaxminlocal}t
2 | |
\nabla | |
normL(\hat{x}, |
\hat{y};t)
An earlier approach to corner detection is to detect points where the curvature of level curves and the gradient magnitude are simultaneously high. A differential way to detect such points is by computing the rescaled level curve curvature (the product of the level curve curvature and the gradient magnitude raised to the power of three)
\tilde{\kappa}(x,y;t)=
2 | |
L | |
x |
Lyy+
2 | |
L | |
y |
Lxx-2LxLyLxy
t
L
\gamma
\tilde{\kappa}norm(x,y;t)=t2
2 | |
(L | |
x |
Lyy+
2 | |
L | |
y |
Lxx-2LxLyLxy)
\gamma=7/8
(\hat{x},\hat{y};\hat{t})=\operatorname{argminmaxlocal}(x,\tilde{\kappa}norm(x,y;t)
LoG is an acronym standing for Laplacian of Gaussian, DoG is an acronym standing for difference of Gaussians (DoG is an approximation of LoG), and DoH is an acronym standing for determinant of the Hessian. These scale-invariant interest points are all extracted by detecting scale-space extrema of scale-normalized differential expressions, i.e., points in scale-space where the corresponding scale-normalized differential expressions assume local extrema with respect to both space and scale
(\hat{x},\hat{y};\hat{t})=\operatorname{argminmaxlocal}(x,(DnormL)(x,y;t)
DnormL
These detectors are more completely described in blob detection. The scale-normalized Laplacian of the Gaussian and difference-of-Gaussian features (Lindeberg 1994, 1998; Lowe 2004)
2 | |
\begin{align} \nabla | |
norm |
L(x,y;t) &=t(Lxx+Lyy)\\ & ≈
t\left(L(x,y;t+\Deltat)-L(x,y;t)\right) | |
\Deltat |
\end{align}
The scale-normalized determinant of the Hessian operator (Lindeberg 1994, 1998)
\detHnormL=t2(LxxLyy-
2) | |
L | |
xy |
The scale selection properties, affine transformation properties and experimental properties of these and other scale-space interest point detectors are analyzed in detail in (Lindeberg 2013, 2015).[4] [5]
Inspired by the structurally similar properties of the Hessian matrix
Hf
f
\mu
(Hf')=A-T(Hf)A-1
\mu'=A-T\muA-1
D1,normL= \begin{cases} t2(\detHL-k\operatorname{trace}2HL)&if\detHL-k\operatorname{trace}2HL>0\\ 0&otherwise \end{cases}
\tilde{D}1,normL= \begin{cases} t2(\detHL-k\operatorname{trace}2HL)&if\detHL-k\operatorname{trace}2HL>0\\ t2(\detHL+k\operatorname{trace}2HL)&if\detHL+k\operatorname{trace}2HL<0\\ 0&otherwise \end{cases}
D2,normL=tmin(|λ1(HL)|,|λ2(HL)|)
\tilde{D}2,normL= \begin{cases} tλ1(HL) &if|λ1(HL)|<|λ2(HL)|\\ tλ2(HL) &if|λ2(HL)|<|λ1(HL)|\\ t(λ1(HL)+λ2(HL))/2&otherwise \end{cases}
\operatorname{trace}HL=Lxx+Lyy
\detHL=LxxLyy-
2 | |
L | |
xy |
HL
L
t
λ1(HL)=Lpp=
1 | |
2 |
\left(Lxx+Lyy-\sqrt{(Lxx-Lyy)2+4
2} | |
L | |
xy |
\right)
λ2(HL)=Lqq=
1 | |
2 |
\left(Lxx+Lyy+\sqrt{(Lxx-Lyy)2+4
2} | |
L | |
xy |
\right)
The unsigned Hessian feature strength measure
D1,normL
\tilde{D}1,normL
D2,normL
\tilde{D}2,normL
In Lindeberg (2015)[4] these four differential entities were combined with local scale selection based on either scale-space extrema detection
(\hat{x},\hat{y};\hat{t})=\operatorname{argminmaxlocal}(x,(DnormL)(x,y;t)
D2,normL
\tilde{D}2,normL
D1,normL>0
By experiments on image matching under scaling transformations on a poster dataset with 12 posters with multi-view matching over scaling transformations up to a scaling factor of 6 and viewing direction variations up to a slant angle of 45 degrees with local image descriptors defined from reformulations of the pure image descriptors in the SIFT and SURF operators to image measurements in terms of Gaussian derivative operators (Gauss-SIFT and Gauss-SURF) instead of original SIFT as defined from an image pyramid or original SURF as defined from Haar wavelets, it was shown that scale-space interest point detection based on the unsigned Hessian feature strength measure
D1,normL
\detHnormL=t2\left(LxxLyy-
2\right) | |
L | |
xy |
D1,normL
\tilde{D}1,normL
\detHnormL
2 | |
\nabla | |
norm |
L=t(Lxx+Lyy)
D1,normL>0
\tilde{D}2,normL
2 | |
\nabla | |
norm |
L
Furthermore, it was shown that all these differential scale-space interest point detectors defined from the Hessian matrix allow for the detection of a larger number of interest points and better matching performance compared to the Harris and Shi-and-Tomasi operators defined from the structure tensor (second-moment matrix).
A theoretical analysis of the scale selection properties of these four Hessian feature strength measures and other differential entities for detecting scale-space interest points, including the Laplacian of the Gaussian and the determinant of the Hessian, is given in Lindeberg (2013)[5] and an analysis of their affine transformation properties as well as experimental properties in Lindeberg (2015).[4]
The interest points obtained from the multi-scale Harris operator with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain an interest point operator that is more robust to perspective transformations, a natural approach is to devise a feature detector that is invariant to affine transformations. In practice, affine invariant interest points can be obtained by applying affine shape adaptation where the shape of the smoothing kernel is iteratively warped to match the local image structure around the interest point or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationally symmetric (Lindeberg 1993, 2008; Lindeberg and Garding 1997; Mikolajzcyk and Schmid 2004).[7] Hence, besides the commonly used multi-scale Harris operator, affine shape adaptation can be applied to other corner detectors as listed in this article as well as to differential blob detectors such as the Laplacian/difference of Gaussian operator, the determinant of the Hessian and the Hessian–Laplace operator.
The Wang and Brady detector considers the image to be a surface, and looks for places where there is large curvature along an image edge. In other words, the algorithm looks for places where the edge changes direction rapidly. The corner score,
C
C=\left(
\delta2I | |
\deltat2 |
\right)2-c|\nablaI|2,
where
\bf{t}
c
Smoothing also causes displacement of corners, so the authors derive an expression for the displacement of a 90 degree corner, and apply this as a correction factor to the detected corners.
SUSAN is an acronym standing for smallest univalue segment assimilating nucleus. This method is the subject of a 1994 UK patent which is no longer in force.[8]
For feature detection, SUSAN places a circular mask over the pixel to be tested (the nucleus). The region of the mask is
M
\vec{m}\inM
\vec{m}0
c(\vec{m})=
6} | |
e | |
0)}{t}\right) |
where
t
I
n(M)=\sum\vec{m\inM}c(\vec{m})
If
c
n
t
R(M)=\begin{cases} g-n(M)&if n(M)<g\\ 0&otherwise, \end{cases}
where
g
The value
t
g
g
For corner detection, two further steps are used. Firstly, the centroid of the SUSAN is found. A proper corner will have the centroid far from the nucleus. The second step insists that all points on the line from the nucleus through the centroid out to the edge of the mask are in the SUSAN.
In a manner similar to SUSAN, this detector directly tests whether a patch under a pixel is self-similar by examining nearby pixels.
\vec{c}
\vec{p}\inP
P
\vec{c}
\vec{p}'
\vec{p}
The response function is defined as:
r(\vec{c})=min\vec{p\inP}\left(\left(I(\vec{p})-I(\vec{c})\right)2+\left(I(\vec{p}')-I(\vec{c})\right)2\right)
This will be large when there is no direction in which the centre pixel is similar to two nearby pixels along a diameter.
P
min
c
AST is an acronym standing for accelerated segment test. This test is a relaxed version of the SUSAN corner criterion. Instead of evaluating the circular disc, only the pixels in a Bresenham circle of radius
r
n
t
t
The first corner detection algorithm based on the AST is FAST (features from accelerated segment test). Although
r
n
n
Trujillo and Olague introduced a method by which genetic programming is used to automatically synthesize image operators that can detect interest points. The terminal and function sets contain primitive operations that are common in many previously proposed man-made designs. Fitness measures the stability of each operator through the repeatability rate, and promotes a uniform dispersion of detected points across the image plane. The performance of the evolved operators has been confirmed experimentally using training and testing sequences of progressively transformed images. Hence, the proposed GP algorithm is considered to be human-competitive for the problem of interest point detection.
The Harris operator has been extended to space-time by Laptev and Lindeberg.Let
\mu
A=\sumu\sumv\sumwh(u,v,w)
2 | |
\begin{bmatrix} L | |
x(u,v,w) |
&Lx(u,v,w)Ly(u,v,w)&Lx(u,v,w)Lt(u,v,w)\\ Lx(u,v,w)Ly(u,v,w)&
2 | |
L | |
y(u,v,w) |
&Ly(u,v,w)Lt(u,v,w)\\ Lx(u,v,w)Lt(u,v,w)&Ly(u,v,w)Lt(u,v,w)&
2 | |
L | |
t(u,v,w) |
\\ \end{bmatrix} = \begin{bmatrix} \langle
2 | |
L | |
x |
\rangle&\langleLxLy\rangle&\langleLxLt\rangle\\ \langleLxLy\rangle&\langle
2 | |
L | |
y |
\rangle&\langleLyLt\rangle\\ \langleLxLt\rangle&\langleLyLt\rangle&\langle
2 | |
L | |
t |
\rangle\\ \end{bmatrix}
Then, for a suitable choice of
k<1/27
H=\det(\mu)-\kappa\operatorname{trace}2(\mu).
The determinant of the Hessian operator has been extended to joint space-time by Willems et al and Lindeberg, leading to the following scale-normalized differential expression:
\det(H(x,y,t),normL)=
2\gammas | |
s |
\gamma\tau | |
\tau |
\left(LxxLyyLtt+2LxyLxtLyt-Lxx
2 | |
L | |
yt |
-Lyy
2 | |
L | |
xt |
-Ltt
2 | |
L | |
xy |
\right).
In the work by Willems et al, a simpler expression corresponding to
\gammas=1
\gamma\tau=1
\gammas=5/4
\gamma\tau=5/4
s=s0
\tau=\tau0
The Laplacian operator has been extended to spatio-temporal video data by Lindeberg, leading to the following two spatio-temporal operators, which also constitute models of receptive fields of non-lagged vs. lagged neurons in the LGN:
\partialt,norm
2 | |
(\nabla | |
(x,y),norm |
L)=
\gammas | |
s |
\gamma\tau/2 | |
\tau |
(Lxxt+Lyyt),
\partialtt,norm
2 | |
(\nabla | |
(x,y),norm |
L)=
\gammas | |
s |
\gamma\tau | |
\tau |
(Lxxtt+Lyytt).
For the first operator, scale selection properties call for using
\gammas=1
\gamma\tau=1/2
\gammas=1
\gamma\tau=3/4
Colour extensions of spatio-temporal interest point detectors have been investigated by Everts et al.
This section provides external links to reference implementations of some of the detectors described above. These reference implementations are provided by the authors of the paper in which the detector is first described. These may contain details not present or explicit in the papers describing the features.