In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is by using convolution.
Given some property of interest expressed as a function of position on the image, there are two main classes of blob detectors: (i) differential methods, which are based on derivatives of the function with respect to position, and (ii) methods based on local extrema, which are based on finding the local maxima and minima of the function. With the more recent terminology used in the field, these detectors can also be referred to as interest point operators, or alternatively interest region operators (see also interest point detection and corner detection).
There are several motivations for studying and developing blob detectors. One main reason is to provide complementary information about regions, which is not obtained from edge detectors or corner detectors. In early work in the area, blob detection was used to obtain regions of interest for further processing. These regions could signal the presence of objects or parts of objects in the image domain with application to object recognition and/or object tracking. In other domains, such as histogram analysis, blob descriptors can also be used for peak detection with application to segmentation. Another common use of blob descriptors is as main primitives for texture analysis and texture recognition. In more recent work, blob descriptors have found increasingly popular use as interest points for wide baseline stereo matching and to signal the presence of informative image features for appearance-based object recognition based on local image statistics. There is also the related notion of ridge detection to signal the presence of elongated objects.
One of the first and also most common blob detectors is based on the Laplacian of the Gaussian (LoG). Given an input image
f(x,y)
g(x,y,t)=
1 | |
2\pit |
| ||||
e |
at a certain scale
t
L(x,y;t) =g(x,y,t)*f(x,y)
\nabla2L=Lxx+Lyy
A straightforward way to obtain a multi-scale blob detector with automatic scale selection is to consider the scale-normalized Laplacian operator
2 | |
\nabla | |
norm |
L=t(Lxx+Lyy)
2 | |
\nabla | |
norm |
L
f(x,y)
L(x,y,t)
(\hat{x},\hat{y})
\hat{t}
(\hat{x},\hat{y};\hat{t})=\operatorname{argmaxminlocal}(x,
2 | |
((\nabla | |
norm |
L)(x,y;t))
(x0,y0;t0)
s
\left(sx0,sy0;s2t0\right)
The scale selection properties of the Laplacian operator and other closely scale-space interest point detectors are analyzed in detail in (Lindeberg 2013a).[1] In (Lindeberg 2013b, 2015)[2] it is shown that there exist other scale-space interest point detectors, such as the determinant of the Hessian operator, that perform better than Laplacian operator or its difference-of-Gaussians approximation for image-based matching using local SIFT-like image descriptors.
L(x,y,t)
\partialtL=
1 | |
2 |
\nabla2L
\nabla2L(x,y,t)
2 | |
\nabla | |
norm |
L(x,y;t) ≈
t | |
\Deltat |
\left(L(x,y;t+\Deltat)-L(x,y;t)\right)
By considering the scale-normalized determinant of the Hessian, also referred to as the Monge–Ampère operator,
\detHnormL=t2\left(LxxLyy-
2\right) | |
L | |
xy |
HL
L
(\hat{x},\hat{y};\hat{t})=\operatorname{argmaxlocal}(x,((\detHnormL)(x,y;t))
(\hat{x},\hat{y})
\hat{t}
A detailed analysis of the selection properties of the determinant of the Hessian operator and other closely scale-space interest point detectors is given in (Lindeberg 2013a)[1] showing that the determinant of the Hessian operator has better scale selection properties under affine image transformations than the Laplacian operator.In (Lindeberg 2013b, 2015)[2] [4] it is shown that the determinant of the Hessian operator performs significantly better than the Laplacian operator or its difference-of-Gaussians approximation, as well as better than the Harris or Harris-Laplace operators, for image-based matching using local SIFT-like or SURF-like image descriptors, leading to higher efficiency values and lower 1-precision scores.
A hybrid operator between the Laplacian and the determinant of the Hessian blob detectors has also been proposed, where spatial selection is done by the determinant of the Hessian and scale selection is performed with the scale-normalized Laplacian (Mikolajczyk and Schmid 2004):
(\hat{x},\hat{y})=\operatorname{argmaxlocal}(x,((\detHL)(x,y;t))
\hat{t}=\operatorname{argmaxminlocal}t
2 | |
((\nabla | |
norm |
L)(\hat{x},\hat{y};t))
The blob descriptors obtained from these blob detectors with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain blob descriptors that are more robust to perspective transformations, a natural approach is to devise a blob detector that is invariant to affine transformations. In practice, affine invariant interest points can be obtained by applying affine shape adaptation to a blob descriptor, where the shape of the smoothing kernel is iteratively warped to match the local image structure around the blob, or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationally symmetric (Lindeberg and Garding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004, Lindeberg 2008). In this way, we can define affine-adapted versions of the Laplacian/Difference of Gaussian operator, the determinant of the Hessian and the Hessian-Laplace operator (see also Harris-Affine and Hessian-Affine).
The determinant of the Hessian operator has been extended to joint space-time by Willems et al.[5] and Lindeberg,[6] leading to the following scale-normalized differential expression:
\det(H(x,y,t),normL)=
2\gammas | |
s |
\gamma\tau | |
\tau |
\left(LxxLyyLtt+2LxyLxtLyt-Lxx
2 | |
L | |
yt |
-Lyy
2 | |
L | |
xt |
-Ltt
2 | |
L | |
xy |
\right).
In the work by Willems et al.,[5] a simpler expression corresponding to
\gammas=1
\gamma\tau=1
\gammas=5/4
\gamma\tau=5/4
s=s0
\tau=\tau0
The Laplacian operator has been extended to spatio-temporal video data by Lindeberg,[6] leading to the following two spatio-temporal operators, which also constitute models of receptive fields of non-lagged vs. lagged neurons in the LGN:
\partialt,norm
2 | |
(\nabla | |
(x,y),norm |
L)=
\gammas | |
s |
\gamma\tau/2 | |
\tau |
(Lxxt+Lyyt),
\partialtt,norm
2 | |
(\nabla | |
(x,y),norm |
L)=
\gammas | |
s |
\gamma\tau | |
\tau |
(Lxxtt+Lyytt).
For the first operator, scale selection properties call for using
\gammas=1
\gamma\tau=1/2
\gammas=1
\gamma\tau=3/4
A natural approach to detect blobs is to associate a bright (dark) blob with each local maximum (minimum) in the intensity landscape. A main problem with such an approach, however, is that local extrema are very sensitive to noise. To address this problem, Lindeberg (1993, 1994) studied the problem of detecting local maxima with extent at multiple scales in scale space. A region with spatial extent defined from a watershed analogy was associated with each local maximum, as well a local contrast defined from a so-called delimiting saddle point. A local extremum with extent defined in this way was referred to as a grey-level blob. Moreover, by proceeding with the watershed analogy beyond the delimiting saddle point, a grey-level blob tree was defined to capture the nested topological structure of level sets in the intensity landscape, in a way that is invariant to affine deformations in the image domain and monotone intensity transformations. By studying how these structures evolve with increasing scales, the notion of scale-space blobs was introduced. Beyond local contrast and extent, these scale-space blobs also measured how stable image structures are in scale-space, by measuring their scale-space lifetime.
It was proposed that regions of interest and scale descriptors obtained in this way, with associated scale levels defined from the scales at which normalized measures of blob strength assumed their maxima over scales could be used for guiding other early visual processing. An early prototype of simplified vision systems was developed where such regions of interest and scale descriptors were used for directing the focus-of-attention of an active vision system. While the specific technique that was used in these prototypes can be substantially improved with the current knowledge in computer vision, the overall general approach is still valid, for example in the way that local extrema over scales of the scale-normalized Laplacian operator are nowadays used for providing scale information to other visual processes.
For the purpose of detecting grey-level blobs (local extrema with extent) from a watershed analogy,Lindeberg developed an algorithm based on pre-sorting the pixels,alternatively connected regions having the same intensity, indecreasing order of the intensity values.Then, comparisons were made between nearest neighbours of either pixels or connected regions.
For simplicity, consider the case of detecting bright grey-level blobs andlet the notation "higher neighbour" stand for "neighbour pixel having a higher grey-level value".Then, at any stage in the algorithm (carried out in decreasing order of intensity values)is based on the following classification rules:
Compared to other watershed methods, the flooding in this algorithm stops once the intensity level falls below the intensity value of the so-called delimiting saddle point associated with the local maximum. However, it is rather straightforward to extend this approach to other types of watershed constructions. For example, by proceeding beyond the first delimiting saddle point a "grey-level blob tree" can be constructed. Moreover, the grey-level blob detection method was embedded in a scale space representation and performed at all levels of scale, resulting in a representation called the scale-space primal sketch.
This algorithm with its applications in computer vision is described in more detail in Lindeberg's thesis[7] as well as the monograph on scale-space theory[8] partially basedon that work. Earlier presentations of this algorithm can also be found in .[9] [10] More detailed treatments of applications of grey-level blob detection and the scale-space primal sketch to computer vision and medical image analysis are given in .[11] [12] [13]
See main article: Maximally stable extremal regions. Matas et al. (2002) were interested in defining image descriptors that are robust under perspective transformations. They studied level sets in the intensity landscape and measured how stable these were along the intensity dimension. Based on this idea, they defined a notion of maximally stable extremal regions and showed how these image descriptors can be used as image features for stereo matching.
There are close relations between this notion and the above-mentioned notion of grey-level blob tree. The maximally stable extremal regions can be seen as making a specific subset of the grey-level blob tree explicit for further processing.