The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing.[1] The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.
The classical Hough transform was concerned with the identification of lines in the image, but later the Hough transform has been extended to identifying positions of arbitrary shapes, most commonly circles or ellipses. The Hough transform as it is universally used today was invented by Richard Duda and Peter Hart in 1972, who called it a "generalized Hough transform"[2] after the related 1962 patent of Paul Hough.[3] [4] The transform was popularized in the computer vision community by Dana H. Ballard through a 1981 journal article titled "Generalizing the Hough transform to detect arbitrary shapes".
It was initially invented for machine analysis of bubble chamber photographs (Hough, 1959).
The Hough transform was patented as in 1962 and assigned to the U.S. Atomic Energy Commission with the name "Method and Means for Recognizing Complex Patterns". This patent uses a slope-intercept parametrization for straight lines, which awkwardly leads to an unbounded transform space since the slope can go to infinity.
The rho-theta parametrization universally used today was first described in
Duda . R.O. . Hart . P. E. . Use of the Hough Transformation to Detect Lines and Curves in Pictures . Comm. ACM . 15 . 11–15 . January 1972 . 10.1145/361237.361242. 1105637 . free .
although it was already standard for the Radon transform since at least the 1930s.
O'Gorman and Clowes' variation is described in
O'Gorman . Frank . Clowes . MB . 1976 . Finding Picture Edges Through Collinearity of Feature Points . IEEE Trans. Comput. . 25 . 4. 449–456 . 10.1109/TC.1976.1674627 . 10851078 .
The story of how the modern form of the Hough transform was invented is given in
Hart . P. E. . How the Hough Transform was Invented . https://web.archive.org/web/20180516103712/https://pdfs.semanticscholar.org/f7e8/cbca97de34fd3695e538e164a1b40d27b04e.pdf . dead . 2018-05-16 . IEEE Signal Processing Magazine . 26 . 6 . 18–22 . November 2009 . 10.1109/msp.2009.934181. 16245096 .
In automated analysis of digital images, a subproblem often arises of detecting simple shapes, such as straight lines, circles or ellipses. In many cases an edge detector can be used as a pre-processing stage to obtain image points or image pixels that are on the desired curve in the image space. Due to imperfections in either the image data or the edge detector, however, there may be missing points or pixels on the desired curves as well as spatial deviations between the ideal line/circle/ellipse and the noisy edge points as they are obtained from the edge detector. For these reasons, it is often non-trivial to group the extracted edge features to an appropriate set of lines, circles or ellipses. The purpose of the Hough transform is to address this problem by making it possible to perform groupings of edge points into object candidates by performing an explicit voting procedure over a set of parameterized image objects (Shapiro and Stockman, 304).
The simplest case of Hough transform is detecting straight lines. In general, the straight line can be represented as a point (b, m) in the parameter space. However, vertical lines pose a problem. They would give rise to unbounded values of the slope parameter m. Thus, for computational reasons, Duda and Hart[5] proposed the use of the Hesse normal form
r=x\cos\theta+y\sin\theta,
where
r
\theta
x
The intuition for this form, similarly to the plane equation, is that every vector on the line must be perpendicular (orthogonal) to the straight line of length
r
P0=(r\cos\theta,r\sin\theta)
P
P-P0
P0-0=P0
P=(x,y)
(P-P0) ⋅ P0=0
P ⋅ P0=P0 ⋅ P0
P=(x,y)
P0=(r\cos\theta,r\sin\theta)
r(x\cos\theta+y\sin\theta)=r2(\cos2\theta+\sin2\theta)
\cos2\theta+\sin2\theta=1
x\cos\theta+y\sin\theta=r
It is therefore possible to associate with each line of the image a pair
(r,\theta)
(r,\theta)
Given a single point in the plane, the set of all straight lines going through that point corresponds to a sinusoidal curve in the (r, θ) plane, which is unique to that point. A set of two or more points that form a straight line will produce sinusoids crossing at the (r, θ) for that line. Thus, the problem of detecting collinear points can be converted to the problem of finding concurrent curves.
Given a shape parametrized by
(a1,...,at)
S
S
Explicitly, the Hough transform performs an approximate naive Bayes inference. We start with a uniform prior on the shape space. We consider only the positive evidence, and ignore all negative evidence, so that we can detect partially occluded shapes.
We add up the log-likelihood in the shape space up to an additive constant. The assumption of naive Bayes means that all pixels in the image space provide independent evidence, so that their likelihoods multiply, that is, their log-likelihoods add. The freedom in additive constant allows us to perform no operation on the "background pixels" in shape space.
Finally, we perform maximum likelihood estimation by picking out the peaks in the log-likelihood on the shape space.[7]
Derivations
ImplementationThe linear Hough transform algorithm estimates the two parameters that define a straight line. The transform space has two dimensions, and every point in the transform space is used as an accumulator to detect or identify a line described by
r=x\cos\theta+y\sin\theta
The dimension of the accumulator equals the number of unknown parameters, i.e., two, considering quantized values of
r
\theta
(r,\theta)
(x,y)
(r,\theta)
By finding the bins with the highest values, typically by looking for local maxima in the accumulator space, the most likely lines can be extracted, and their (approximate) geometric definitions read off (Shapiro and Stockman, 304). The simplest way of finding these peaks is by applying some form of threshold, but other techniques may yield better results in different circumstances – determining which lines are found, as well as how many. Since the lines returned do not contain any length information, it is often necessary, in the next step, to find which parts of the image match up with which lines. Moreover, due to imperfection errors in the edge-detection step, there will usually be errors in the accumulator space, which may make it non-trivial to find the appropriate peaks, and thus the appropriate lines.
The final result of the linear Hough transform is a two-dimensional array (matrix) similar to the accumulator—one dimension of this matrix is the quantized angle
\theta
r
(r,\theta)
Consider three data points, shown here as black dots.
From the calculations, it can be seen that in either case the support line at 60° has a similar length.Hence, it is understood that the corresponding lines (the blue ones in the above picture) are very similar.One can thus assume that all points lie close to the blue line.
The following is a different example showing the results of a Hough transform on a raster image containing two thick lines.
The results of this transform were stored in a matrix. Cell value represents the number of curves through any point. Higher cell values are rendered brighter. The two distinctly bright spots are the Hough parameters of the two lines. From these spots' positions, angle and distance from image center of the two lines in the input image can be determined.
An improvement suggested by O'Gorman and Clowes can be used to detect lines if one takes into account that the local gradient of the image intensity will necessarily be orthogonal to the edge. Since edge detection generally involves computing the intensity gradient magnitude, the gradient direction is often found as a side effect. If a given point of coordinates (x,y) happens to indeed be on a line, then the local direction of the gradient gives the θ parameter corresponding to said line, and the r parameter is then immediately obtained. (Shapiro and Stockman, 305) The gradient direction can be estimated to within 20°, which shortens the sinusoid trace from the full 180° to roughly 45°. This reduces the computation time and has the interesting effect of reducing the number of useless votes, thus enhancing the visibility of the spikes corresponding to real lines in the image.
Fernandes and Oliveira [9] suggested an improved voting scheme for the Hough transform that allows a software implementation to achieve real-time performance even on relatively large images (e.g., 1280×960). The Kernel-based Hough transform uses the same
(r,\theta)
Limberger and Oliveira[10] suggested a deterministic technique for plane detection in unorganized point clouds whose cost is
nlog(n)
105
\theta,\phi,\rho
Although the version of the transform described above applies only to finding straight lines, a similar transform can be used for finding any shape which can be represented by a set of parameters. A circle, for instance, can be transformed into a set of three parameters, representing its center and radius, so that the Hough space becomes three dimensional. Arbitrary ellipses and curves can also be found this way, as can any shape easily expressed as a set of parameters.
The generalization of the Hough transform for detecting analytical shapes in spaces having any dimensionality was proposed by Fernandes and Oliveira.[11] In contrast to other Hough transform-based approaches for analytical shapes, Fernandes' technique does not depend on the shape one wants to detect nor on the input data type. The detection can be driven to a type of analytical shape by changing the assumed model of geometry where data have been encoded (e.g., euclidean space, projective space, conformal geometry, and so on), while the proposed formulation remains unchanged. Also, it guarantees that the intended shapes are represented with the smallest possible number of parameters, and it allows the concurrent detection of different kinds of shapes that best fit an input set of entries with different dimensionalities and different geometric definitions (e.g., the concurrent detection of planes and spheres that best fit a set of points, straight lines and circles).
For more complicated shapes in the plane (i.e., shapes that cannot be represented analytically in some 2D space), the Generalised Hough transform[12] is used, which allows a feature to vote for a particular position, orientation and/or scaling of the shape using a predefined look-up table.The Hough transform accumulates contributions from all pixels in the detected edge.
See main article: Circle Hough Transform. Altering the algorithm to detect circular shapes instead of lines is relatively straightforward.
(i-a)2+(j-b)2=r2
a
a
b
If we do not know the radius of the circle we are trying to locate beforehand, we can use a three-dimensional accumulator space to search for circles with an arbitrary radius. Naturally, this is more computationally expensive.
This method can also detect circles that are partially outside of the accumulator space, as long as enough of the circle's area is still present within it.
Hough transform can also be used for the detection of 3D objects in range data or 3D point clouds. The extension of classical Hough transform for plane detection is quite straightforward. A plane is represented by its explicit equation
z=axx+ayy+d
ax
ay
d
ax
ay
For generalized plane detection using Hough transform, the plane can be parametrized by its normal vector
n
\rho
Hough transform has also been used to find cylindrical objects in point clouds using a two step approach. The first step finds the orientation of the cylinder and the second step finds the position and radius.[16]
One common variation detail. That is, finding the bins with the highest count in one stage can be used to constrain the range of values searched in the next.
A high-dimensional parameter space for the Hough transform is not only slow, but if implemented without forethought can easily overrun the available memory. Even if the programming environment allows the allocation of an array larger than the available memory space through virtual memory, the number of page swaps required for this will be very demanding because the accumulator array is used in a randomly accessed fashion, rarely stopping in contiguous memory as it skips from index to index.
Consider the task of finding ellipses in an 800x600 image. Assuming that the radii of the ellipses are oriented along principal axes, the parameter space is four-dimensional. (x, y) defines the center of the ellipse, and a and b denote the two radii. Allowing the center to be anywhere in the image, adds the constraint 0 A program thus conceived is unlikely to be allowed to allocate sufficient memory. This doesn't mean that the problem can't be solved, but only that new ways to constrain the size of the accumulator array are to be found, which makes it feasible. For instance: By applying just the first three of these constraints to the example stated about, the size of the accumulator array is reduced by almost a factor of 1000, bringing it down to a size that is much more likely to fit within a modern computer's memory. Yonghong Xie and Qiang Ji give an efficient way of implementing the Hough transform for ellipse detection by overcoming the memory issues.[17] As discussed in the algorithm (on page 2 of the paper), this approach uses only a one-dimensional accumulator (for the minor axis) in order to detect ellipses in the image. The complexity is O(N3) in the number of non-zero points in the image. The Hough transform is only efficient if a high number of votes fall in the right bin, so that the bin can be easily detected amid the background noise. This means that the bin must not be too small, or else some votes will fall in the neighboring bins, thus reducing the visibility of the main bin.[18] Also, when the number of parameters is large (that is, when we are using the Hough transform with typically more than three parameters), the average number of votes cast in a single bin is very low, and those bins corresponding to a real figure in the image do not necessarily appear to have a much higher number of votes than their neighbors. The complexity increases at a rate of l{O}\left({Am-2 A m Finally, much of the efficiency of the Hough transform is dependent on the quality of the input data: the edges must be detected well for the Hough transform to be efficient. Use of the Hough transform on noisy images is a very delicate matter and generally, a denoising stage must be used before. In the case where the image is corrupted by speckle, as is the case in radar images, the Radon transform is sometimes preferred to detect lines, because it attenuates the noise through summation.Efficient ellipse detection algorithm
Limitations
See also
External links
Notes and References