Image segmentation explained

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.[1] [2] Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

The result of image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a region are similar with respect to some characteristic or computed property,[3] such as color, intensity, or texture. Adjacent regions are significantly different color with respect to the same characteristic(s). When applied to a stack of images, typical in medical imaging, the resulting contours after image segmentation can be used to create 3D reconstructions with the help of geometry reconstruction algorithms like marching cubes.[4]

Applications

Some of the practical applications of image segmentation are:

Several general-purpose algorithms and techniques have been developed for image segmentation. To be useful, these techniques must typically be combined with a domain's specific knowledge in order to effectively solve the domain's segmentation problems.

Classes of segmentation techniques

There are two classes of segmentation techniques.

Groups of image segmentation

Thresholding

See main article: articles and Thresholding (image processing). The simplest method of image segmentation is called the thresholding method. This method is based on a clip-level (or a threshold value) to turn a gray-scale image into a binary image.

The key of this method is to select the threshold value (or values when multiple-levels are selected). Several popular methods are used in industry including the maximum entropy method, balanced histogram thresholding, Otsu's method (maximum variance), and k-means clustering.

Recently, methods have been developed for thresholding computed tomography (CT) images. The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs instead of the (reconstructed) image.[19] [20]

New methods suggested the usage of multi-dimensional fuzzy rule-based non-linear thresholds. In these works decision over each pixel's membership to a segment is based on multi-dimensional rules derived from fuzzy logic and evolutionary algorithms based on image lighting environment and application.[21]

Clustering methods

See main article: Data clustering. The K-means algorithm is an iterative technique that is used to partition an image into K clusters.[22] The basic algorithm is

  1. Pick K cluster centers, either randomly or based on some heuristic method, for example K-means++
  2. Assign each pixel in the image to the cluster that minimizes the distance between the pixel and the cluster center
  3. Re-compute the cluster centers by averaging all of the pixels in the cluster
  4. Repeat steps 2 and 3 until convergence is attained (i.e. no pixels change clusters)

In this case, distance is the squared or absolute difference between a pixel and a cluster center. The difference is typically based on pixel color, intensity, texture, and location, or a weighted combination of these factors. K can be selected manually, randomly, or by a heuristic. This algorithm is guaranteed to converge, but it may not return the optimal solution. The quality of the solution depends on the initial set of clusters and the value of K.

The Mean Shift algorithm is a technique that is used to partition an image into an unknown apriori number of clusters. This has the advantage of not having to start with an initial guess of such parameter which makes it a better general solution for more diverse cases.

Motion and interactive segmentation

Motion based segmentation is a technique that relies on motion in the image to perform segmentation.

The idea is simple: look at the differences between a pair of images. Assuming the object of interest is moving, the difference will be exactly that object.

Improving on this idea, Kenney et al. proposed interactive segmentation http://www.robotics.tu-berlin.de/fileadmin/fg170/Publikationen_pdf/2009-icra.pdf. They use a robot to poke objects in order to generate the motion signal necessary for motion-based segmentation.

Interactive segmentation follows the interactive perception framework proposed by Dov Katz http://www.dubikatz.com and Oliver Brock http://www.robotics.tu-berlin.de/menue/team/oliver_brock.

Another technique that is based on motion is rigid motion segmentation.

Compression-based methods

Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data.[23] [24] The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it. The method describes each segment by its texture and boundary shape. Each of these components is modeled by a probability distribution function and its coding length is computed as follows:

  1. The boundary encoding leverages the fact that regions in natural images tend to have a smooth contour. This prior is used by Huffman coding to encode the difference chain code of the contours in an image. Thus, the smoother a boundary is, the shorter coding length it attains.
  2. Texture is encoded by lossy compression in a way similar to minimum description length (MDL) principle, but here the length of the data given the model is approximated by the number of samples times the entropy of the model. The texture in each region is modeled by a multivariate normal distribution whose entropy has a closed form expression. An interesting property of this model is that the estimated entropy bounds the true entropy of the data from above. This is because among all distributions with a given mean and covariance, normal distribution has the largest entropy. Thus, the true coding length cannot be more than what the algorithm tries to minimize.

For any given segmentation of an image, this scheme yields the number of bits required to encode that image based on the given segmentation. Thus, among all possible segmentations of an image, the goal is to find the segmentation which produces the shortest coding length. This can be achieved by a simple agglomerative clustering method. The distortion in the lossy compression determines the coarseness of the segmentation and its optimal value may differ for each image. This parameter can be estimated heuristically from the contrast of textures in an image. For example, when the textures in an image are similar, such as in camouflage images, stronger sensitivity and thus lower quantization is required.

Histogram-based methods

Histogram-based methods are very efficient compared to other image segmentation methods because they typically require only one pass through the pixels. In this technique, a histogram is computed from all of the pixels in the image, and the peaks and valleys in the histogram are used to locate the clusters in the image. Color or intensity can be used as the measure.

A refinement of this technique is to recursively apply the histogram-seeking method to clusters in the image in order to divide them into smaller clusters. This operation is repeated with smaller and smaller clusters until no more clusters are formed.[25]

One disadvantage of the histogram-seeking method is that it may be difficult to identify significant peaks and valleys in the image.

Histogram-based approaches can also be quickly adapted to apply to multiple frames, while maintaining their single pass efficiency. The histogram can be done in multiple fashions when multiple frames are considered. The same approach that is taken with one frame can be applied to multiple, and after the results are merged, peaks and valleys that were previously difficult to identify are more likely to be distinguishable. The histogram can also be applied on a per-pixel basis where the resulting information is used to determine the most frequent color for the pixel location. This approach segments based on active objects and a static environment, resulting in a different type of segmentation useful in video tracking.

Edge detection

Edge detection is a well-developed field on its own within image processing. Region boundaries and edges are closely related,since there is often a sharp adjustment in intensity at the region boundaries. Edge detection techniques have therefore been used as the base of another segmentation technique.

The edges identified by edge detection are often disconnected. To segment an object from an image however, one needs closed region boundaries. The desired edges are the boundaries between such objects or spatial-taxons.[26] [27]

Spatial-taxons[28] are information granules,[29] consisting of a crisp pixel region, stationed at abstraction levels within a hierarchical nested scene architecture. They are similar to the Gestalt psychological designation of figure-ground, but are extended to include foreground, object groups, objects and salient object parts. Edge detection methods can be applied to the spatial-taxon region, in the same manner they would be applied to a silhouette. This method is particularly useful when the disconnected edge is part of an illusory contour[30] [31]

Segmentation methods can also be applied to edges obtained from edge detectors. Lindeberg and Li[32] developed an integrated method that segments edges into straight and curved edge segments for parts-based object recognition, based on a minimum description length (MDL) criterion that was optimized by a split-and-merge-like method with candidate breakpoints obtained from complementary junction cues to obtain more likely points at which to consider partitions into different segments.

Isolated Point Detection

The detection of isolated points in an image is a fundamental part of image segmentation. This process primarily depends on the second derivative, indicating the use of the Laplacian operator. The Laplacian of a function

f(x,y)

is given by:

\nabla2f(x,y)=

{\partial2f
} + \frac

The Laplacian operator is employed such that the partial derivatives are derived from a specific equation. The second partial derivative of

f(x,y)

with respect to

x

and

y

are given by:
{\partial2f(x,y)
} = f(x+1,y) + f(x-1,y) - 2f(x,y)
{\partial2f(x,y)
} = f(x,y+1) + f(x,y-1) - 2f(x,y)

These partial derivatives are then used to compute the Laplacian as:

\nabla2f(x,y)=f(x+1,y)+f(x-1,y)+f(x,y+1)+f(x,y-1)-4f(x,y)

This mathematical expression can be implemented by convolving with an appropriate mask. If we extend this equation to three dimensions (x,y,z), the intensity at each pixel location around a central pixel at (x, y, z) is replaced by their corresponding values. This equation becomes particularly useful when we assume that all pixels have unit spacing along each axis.

A sphere mask has been developed for use with three-dimensional datasets. The sphere mask is designed to use only integer arithmetic during calculations, thereby eliminating the need for floating-point hardware or software.

When applying these concepts to actual images represented as arrays of numbers, we need to consider what happens when we reach an edge or border region. The function

g(x,y)

is defined as:

g(x,y)=\begin{cases}1&if|R(x,y)|\geqT\ 0&otherwise\end{cases}

This above equation is used to determine whether a point in the image is an isolated point based on the response magnitude

|R(x,y)|

and a threshold value

T

. If the response magnitude is greater than or equal to the threshold, the function returns 1, indicating the presence of an isolated point; otherwise, it returns 0. This helps in the effective detection and segmentation of isolated points in the image.[33]

Application of Isolated Point Detection in X-ray Image Processing

The detection of isolated points has significant applications in various fields, including X-ray image processing. For instance, an original X-ray image of a turbine blade can be examined pixel-by-pixel to detect porosity in the upper-right quadrant of the blade. The result of applying an edge detector’s response to this X-ray image can be approximated. This demonstrates the segmentation of isolated points in an image with the aid of single-pixel probes.[34]

Dual clustering method

This method is a combination of three characteristics of the image: partition of the image based on histogram analysis is checked by high compactness of the clusters (objects), and high gradients of their borders. For that purpose two spaces have to be introduced: one space is the one-dimensional histogram of brightness HH(B); the second space is the dual 3-dimensional space of the original image itself BB(xy). The first space allows to measure how compactly the brightness of the image is distributed by calculating a minimal clustering kmin. Threshold brightness T corresponding to kmin defines the binary (black-and-white) image – bitmap bφ(xy), where φ(xy) = 0, if B(xy) < T, and φ(xy) = 1, if B(xy) ≥ T. The bitmap b is an object in dual space. On that bitmap a measure has to be defined reflecting how compact distributed black (or white) pixels are. So, the goal is to find objects with good borders. For all T the measure MDCG/(k × L) has to be calculated (where k is difference in brightness between the object and the background, L is length of all borders, and G is mean gradient on the borders). Maximum of MDC defines the segmentation.[35]

Region-growing methods

Region-growing methods rely mainly on the assumption that the neighboring pixels within one region have similar values. The common procedure is to compare one pixel with its neighbors. If a similarity criterion is satisfied, the pixel can be set to belong to the same cluster as one or more of its neighbors. The selection of the similarity criterion is significant and the results are influenced by noise in all instances.

The method of Statistical Region Merging[36] (SRM) starts by building the graph of pixels using 4-connectedness with edges weighted by the absolute value of the intensity difference. Initially each pixel forms a single pixel region. SRM then sorts those edges in a priority queue and decides whether or not to merge the current regions belonging to the edge pixels using a statistical predicate.

One region-growing method is the seeded region growing method. This method takes a set of seeds as input along with the image. The seeds mark each of the objects to be segmented. The regions are iteratively grown by comparison of all unallocated neighboring pixels to the regions. The difference between a pixel's intensity value and the region's mean,

\delta

, is used as a measure of similarity. The pixel with the smallest difference measured in this way is assigned to the respective region. This process continues until all pixels are assigned to a region. Because seeded region growing requires seeds as additional input, the segmentation results are dependent on the choice of seeds, and noise in the image can cause the seeds to be poorly placed.

Another region-growing method is the unseeded region growing method. It is a modified algorithm that does not require explicit seeds. It starts with a single region

A1

—the pixel chosen here does not markedly influence the final segmentation. At each iteration it considers the neighboring pixels in the same way as seeded region growing. It differs from seeded region growing in that if the minimum

\delta

is less than a predefined threshold

T

then it is added to the respective region

Aj

. If not, then the pixel is considered different from all current regions

Ai

and a new region

An+1

is created with this pixel.

One variant of this technique, proposed by Haralick and Shapiro (1985), is based on pixel intensities. The mean and scatter of the region and the intensity of the candidate pixel are used to compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the region, and the region's mean and scatter are recomputed. Otherwise, the pixel is rejected, and is used to form a new region.

A special region-growing method is called

λ

-connected segmentation (see also lambda-connectedness). It is based on pixel intensities and neighborhood-linking paths. A degree of connectivity (connectedness) is calculated based on a path that is formed by pixels. For a certain value of

λ

, two pixels are called

λ

-connected if there is a path linking those two pixels and the connectedness of this path is at least

λ

.

λ

-connectedness is an equivalence relation.[37]

Split-and-merge segmentation is based on a quadtree partition of an image. It is sometimes called quadtree segmentation.

This method starts at the root of the tree that represents the whole image. If it is found non-uniform (not homogeneous), then it is split into four child squares (the splitting process), and so on. If, in contrast, four child squares are homogeneous, they are merged as several connected components (the merging process). The node in the tree is a segmented node. This process continues recursively until no further splits or merges are possible.[38] [39] When a special data structure is involved in the implementation of the algorithm of the method, its time complexity can reach

O(nlogn)

, an optimal algorithm of the method.[40]

Partial differential equation-based methods

Using a partial differential equation (PDE)-based method and solving the PDE equation by a numerical scheme, one can segment the image.[41] Curve propagation is a popular technique in this category, with numerous applications to object extraction, object tracking, stereo reconstruction, etc. The central idea is to evolve an initial curve towards the lowest potential of a cost function, where its definition reflects the task to be addressed. As for most inverse problems, the minimization of the cost functional is non-trivial and imposes certain smoothness constraints on the solution, which in the present case can be expressed as geometrical constraints on the evolving curve.

Parametric methods

Lagrangian techniques are based on parameterizing the contour according to some sampling strategy and then evolving each element according to image and internal terms. Such techniques are fast and efficient, however the original "purely parametric" formulation (due to Kass, Witkin and Terzopoulos in 1987 and known as "snakes"), is generally criticized for its limitations regarding the choice of sampling strategy, the internal geometric properties of the curve, topology changes (curve splitting and merging), addressing problems in higher dimensions, etc.. Nowadays, efficient "discretized" formulations have been developed to address these limitations while maintaining high efficiency. In both cases, energy minimization is generally conducted using a steepest-gradient descent, whereby derivatives are computed using, e.g., finite differences.

Level-set methods

The level-set method was initially proposed to track moving interfaces by Dervieux and Thomasset[42] [43] in 1979 and 1981 and was later reinvented by Osher and Sethian in 1988.[44] This has spread across various imaging domains in the late 1990s. It can be used to efficiently address the problem of curve/surface/etc. propagation in an implicit manner. The central idea is to represent the evolving contour using a signed function whose zero corresponds to the actual contour. Then, according to the motion equation of the contour, one can easily derive a similar flow for the implicit surface that when applied to the zero level will reflect the propagation of the contour. The level-set method affords numerous advantages: it is implicit, is parameter-free, provides a direct way to estimate the geometric properties of the evolving structure, allows for change of topology, and is intrinsic. It can be used to define an optimization framework, as proposed by Zhao, Merriman and Osher in 1996. One can conclude that it is a very convenient framework for addressing numerous applications of computer vision and medical image analysis.[45] Research into various level-set data structures has led to very efficient implementations of this method.

Fast marching methods

The fast marching method has been used in image segmentation,[46] and this model has been improved (permitting both positive and negative propagation speeds) in an approach called the generalized fast marching method.

Variational methods

The goal of variational methods is to find a segmentationwhich is optimal with respect to a specific energy functional. The functionals consist of a data fitting term and a regularizing terms. A classical representative is the Potts model defined for an image

f

by

\operatorname{argmin}u\gamma\|\nablau\|0+\int(u-f)2dx.

A minimizer

u*

is a piecewise constant image which has an optimal tradeoff between the squared L2 distance to the given image

f

and the total length of its jump set. The jump set of

u*

defines a segmentation. The relative weight of the energies is tuned by the parameter

\gamma>0

. The binary variant of the Potts model, i.e., if the range of

u

is restricted to two values, is often called Chan-Vese model.[47] An important generalization is the Mumford-Shah model[48] given by

\operatorname{argmin}u,\gamma|K|+ \mu

\int
KC

|\nablau|2dx+\int(u-f)2dx.

The functional value is the sum of the total length of the segmentation curve

K

, the smoothness of the approximation

u

, and its distance to the original image

f

. The weight of the smoothness penalty is adjusted by

\mu>0

. The Potts model is often called piecewise constant Mumford-Shah model as it can be seen as the degenerate case

\mu\toinfty

. The optimization problems are known to be NP-hard in general but near-minimizing strategies work well in practice. Classical algorithms are graduated non-convexity and Ambrosio-Tortorelli approximation.

Graph partitioning methods

Graph partitioning methods are an effective tools for image segmentation since they model the impact of pixel neighborhoods on a given cluster of pixels or pixel, under the assumption of homogeneity in images. In these methods, the image is modeled as a weighted, undirected graph. Usually a pixel or a group of pixels are associated with nodes and edge weights define the (dis)similarity between the neighborhood pixels. The graph (image) is then partitioned according to a criterion designed to model "good" clusters. Each partition of the nodes (pixels) output from these algorithms are considered an object segment in the image; see Segmentation-based object categorization. Some popular algorithms of this category are normalized cuts,[49] random walker,[50] minimum cut,[51] isoperimetric partitioning,[52] minimum spanning tree-based segmentation,[53] and segmentation-based object categorization.

Markov random fields

The application of Markov random fields (MRF) for images was suggested in early 1984 by Geman and Geman.[54] Their strong mathematical foundation and ability to provide a global optimum even when defined on local features proved to be the foundation for novel research in the domain of image analysis, de-noising and segmentation. MRFs are completely characterized by their prior probability distributions, marginal probability distributions, cliques, smoothing constraint as well as criterion for updating values. The criterion for image segmentation using MRFs is restated as finding the labelling scheme which has maximum probability for a given set of features. The broad categories of image segmentation using MRFs are supervised and unsupervised segmentation.

Supervised image segmentation using MRF and MAP

In terms of image segmentation, the function that MRFs seek to maximize is the probability of identifying a labelling scheme given a particular set of features are detected in the image. This is a restatement of the maximum a posteriori estimation method.

The generic algorithm for image segmentation using MAP is given below:

Optimization algorithms

Each optimization algorithm is an adaptation of models from a variety of fields and they are set apart by their unique cost functions. The common trait of cost functions is to penalize change in pixel value as well as difference in pixel label when compared to labels of neighboring pixels.

Iterated conditional modes/gradient descent

The iterated conditional modes (ICM) algorithm tries to reconstruct the ideal labeling scheme by changing the values of each pixel over each iteration and evaluating the energy of the new labeling scheme using the cost function given below,

\alpha(1-\delta(\elli-\ellinitiali)+\beta\Sigmaq(1-\delta(\elli,\ellq(i))).

where is the penalty for change in pixel label and is the penalty for difference in label betweenneighboring pixels and chosen pixel. Here

N(i)

is neighborhood of pixel i and is the Kronecker delta function. A major issue with ICM is that, similar to gradient descent, it has a tendency to rest over local maxima and thus not obtain a globally optimal labeling scheme.
Simulated annealing (SA)

Derived as an analogue of annealing in metallurgy, simulated annealing (SA) uses change in pixel label over iterations and estimates the difference in energy of each newly formed graph to the initial data. If the newly formed graph is more profitable, in terms of low energy cost, given by:

\DeltaU=Unew-Uold

\elli

new
= \begin{cases} \ell
i,

&if\DeltaU\leq0

new
,\\ \ell
i,

&if\DeltaU>0and\delta<e-\Delta

old
, \ell
i \end{cases}

the algorithm selects the newly formed graph. Simulated annealing requires the input of temperature schedules which directly affects the speed of convergence of the system, as well as energy threshold for minimization to occur.

Alternative algorithms

A range of other methods exist for solving simple as well as higher order MRFs. They include Maximization of Posterior Marginal, Multi-scale MAP estimation,[55] Multiple Resolution segmentation[56] and more. Apart from likelihood estimates, graph-cut using maximum flow[57] and other highly constrained graph based methods[58] [59] exist for solving MRFs.

Image segmentation using MRF and expectation–maximization

The expectation–maximization algorithm is utilized to iteratively estimate the a posterior probabilities and distributions of labeling when no training data is available and no estimate of segmentation model can be formed. A general approach is to use histograms to represent the features of an image and proceed as outlined briefly in this three-step algorithm:

1. A random estimate of the model parameters is utilized.

2. E step: Estimate class statistics based on the random segmentation model defined. Using these, compute the conditional probability of belonging to a label given the feature set is calculated using naive Bayes' theorem.

P(λ\midfi)=

P(fi\midλ)P(λ)
\SigmaλP(fi\midλ)P(λ)

Here

λ\inΛ

, the set of all possible labels.

3. M step: The established relevance of a given feature set to a labeling scheme is now used to compute the a priori estimate of a given label in the second part of the algorithm. Since the actual number of total labels is unknown (from a training data set), a hidden estimate of the number of labels given by the user is utilized in computations.

P(λ)=

\SigmaλP(λ\midfi)
|\Omega|

where

\Omega

is the set of all possible features.

Disadvantages of MAP and EM based image segmentation

  1. Exact MAP estimates cannot be easily computed.
  2. Approximate MAP estimates are computationally expensive to calculate.
  3. Extension to multi-class labeling degrades performance and increases storage required.
  4. Reliable estimation of parameters for EM is required for global optima to be achieved.
  5. Based on method of optimization, segmentation may cluster to local minima.

Watershed transformation

The watershed transformation considers the gradient magnitude of an image as a topographic surface. Pixels having the highest gradient magnitude intensities (GMIs) correspond to watershed lines, which represent the region boundaries. Water placed on any pixel enclosed by a common watershed line flows downhill to a common local intensity minimum (LIM). Pixels draining to a common minimum form a catch basin, which represents a segment.

Model-based segmentation

The central assumption of model-based approaches is that the structures of interest have a tendency towards a particular shape. Therefore, one can seek a probabilistic model that characterizes the shape and its variation. When segmenting an image, constraints can be imposed using this model as a prior.[60] Such a task may involve (i) registration of the training examples to a common pose, (ii) probabilistic representation of the variation of the registered samples, and (iii) statistical inference between the model and the image. Other important methods in the literature for model-based segmentation include active shape models and active appearance models.

Multi-scale segmentation

Image segmentations are computed at multiple scales in scale space and sometimes propagated from coarse to fine scales; see scale-space segmentation.

Segmentation criteria can be arbitrarily complex and may take into account global as well as local criteria. A common requirement is that each region must be connected in some sense.

One-dimensional hierarchical signal segmentation

Witkin's seminal work[61] [62] in scale space included the notion that a one-dimensional signal could be unambiguously segmented into regions, with one scale parameter controlling the scale of segmentation.

A key observation is that the zero-crossings of the second derivatives (minima and maxima of the first derivative or slope) of multi-scale-smoothed versions of a signal form a nesting tree, which defines hierarchical relations between segments at different scales. Specifically, slope extrema at coarse scales can be traced back to corresponding features at fine scales. When a slope maximum and slope minimum annihilate each other at a larger scale, the three segments that they separated merge into one segment, thus defining the hierarchy of segments.

Image segmentation and primal sketch

There have been numerous research works in this area, out of which a few have now reached a state where they can be applied either with interactive manual intervention (usually with application to medical imaging) or fully automatically. The following is a brief overview of some of the main research ideas that current approaches are based upon.

The nesting structure that Witkin described is, however, specific for one-dimensional signals and does not trivially transfer to higher-dimensional images. Nevertheless, this general idea has inspired several other authors to investigate coarse-to-fine schemes for image segmentation. Koenderink[63] proposed to study how iso-intensity contours evolve over scales and this approach was investigated in more detail by Lifshitz and Pizer.[64] Unfortunately, however, the intensity of image features changes over scales, which implies that it is hard to trace coarse-scale image features to finer scales using iso-intensity information.

Lindeberg[65] [66] studied the problem of linking local extrema and saddle points over scales, and proposed an image representation called the scale-space primal sketch which makes explicit the relations between structures at different scales, and also makes explicit which image features are stable over large ranges of scale including locally appropriate scales for those. Bergholm proposed to detect edges at coarse scales in scale-space and then trace them back to finer scales with manual choice of both the coarse detection scale and the fine localization scale.

Gauch and Pizer[67] studied the complementary problem of ridges and valleys at multiple scales and developed a tool for interactive image segmentation based on multi-scale watersheds. The use of multi-scale watershed with application to the gradient map has also been investigated by Olsen and Nielsen[68] and been carried over to clinical use by Dam.[69] Vincken et al.[70] proposed a hyperstack for defining probabilistic relations between image structures at different scales. The use of stable image structures over scales has been furthered by Ahuja[71] [72] and his co-workers into a fully automated system. A fully automatic brain segmentation algorithm based on closely related ideas of multi-scale watersheds has been presented by Undeman and Lindeberg[73] and been extensively tested in brain databases.

These ideas for multi-scale image segmentation by linking image structures over scales have also been picked up by Florack and Kuijper.[74] Bijaoui and Rué[75] associate structures detected in scale-space above a minimum noise threshold into an object tree which spans multiple scales and corresponds to a kind of feature in the original signal. Extracted features are accurately reconstructed using an iterative conjugate gradient matrix method.

Semi-automatic segmentation

In one kind of segmentation, the user outlines the region of interest with the mouse clicks and algorithms are applied so that the path that best fits the edge of the image is shown.

Techniques like SIOX, Livewire, Intelligent Scissors or IT-SNAPS are used in this kind of segmentation. In an alternative kind of semi-automatic segmentation, the algorithms return a spatial-taxon (i.e. foreground, object-group, object or object-part) selected by the user or designated via prior probabilities.[76] [77]

Trainable segmentation

Most of the aforementioned segmentation methods are based only on color information of pixels in the image. Humans use much more knowledge when performing image segmentation, but implementing this knowledge would cost considerable human engineering and computational time, and would require a huge domain knowledge database which does not currently exist. Trainable segmentation methods, such as neural network segmentation, overcome these issues by modeling the domain knowledge from a dataset of labeled pixels.

An image segmentation neural network can process small areas of an image to extract simple features such as edges.[78] Another neural network, or any decision-making mechanism, can then combine these features to label the areas of an image accordingly. A type of network designed this way is the Kohonen map.

Pulse-coupled neural networks (PCNNs) are neural models proposed by modeling a cat's visual cortex and developed for high-performance biomimetic image processing. In 1989, Reinhard Eckhorn introduced a neural model to emulate the mechanism of a cat's visual cortex. The Eckhorn model provided a simple and effective tool for studying the visual cortex of small mammals, and was soon recognized as having significant application potential in image processing. In 1994, the Eckhorn model was adapted to be an image processing algorithm by John L. Johnson, who termed this algorithm Pulse-Coupled Neural Network.[79] Over the past decade, PCNNs have been utilized for a variety of image processing applications, including: image segmentation, feature generation, face extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be utilized for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc.

U-Net is a convolutional neural network which takes as input an image and outputs a label for each pixel.[80] U-Net initially was developed to detect cell boundaries in biomedical images. U-Net follows classical autoencoder architecture, as such it contains two sub-structures. The encoder structure follows the traditional stack of convolutional and max pooling layers to increase the receptive field as it goes through the layers. It is used to capture the context in the image. The decoder structure utilizes transposed convolution layers for upsampling so that the end dimensions are close to that of the input image. Skip connections are placed between convolution and transposed convolution layers of the same shape in order to preserve details that would have been lost otherwise.

In addition to pixel-level semantic segmentation tasks which assign a given category to each pixel, modern segmentation applications include instance-level semantic segmentation tasks in which each individual in a given category must be uniquely identified, as well as panoptic segmentation tasks which combines these two tasks to provide a more complete scene segmentation.[18]

Segmentation of related images and videos

See main article: Object co-segmentation.

Related images such as a photo album or a sequence of video frames often contain semantically similar objects and scenes, therefore it is often beneficial to exploit such correlations.[81] The task of simultaneously segmenting scenes from related images or video frames is termed co-segmentation, which is typically used in human action localization. Unlike conventional bounding box-based object detection, human action localization methods provide finer-grained results, typically per-image segmentation masks delineating the human object of interest and its action category (e.g., Segment-Tube). Techniques such as dynamic Markov Networks, CNN and LSTM are often employed to exploit the inter-frame correlations.

Other methods

There are many other methods of segmentation like multispectral segmentation or connectivity-based segmentation based on DTI images.[82] [83]

References

External links

Notes and References

  1. [Linda Shapiro|Linda G. Shapiro]
  2. Barghout, Lauren, and Lawrence W. Lee. "Perceptual information processing system." Paravue Inc. U.S. Patent Application 10/618,543, filed July 11, 2003.
  3. Nielsen . Frank . Nock . Richard . 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. . On region merging: The statistical soundness of fast sorting, with applications . IEEE . 2003 . 2 . 10.1109/CVPR.2003.1211447 . II:19–26 . 0-7695-1900-8 .
  4. Zachow, Stefan, Michael Zilske, and Hans-Christian Hege. "3D reconstruction of individual anatomy from medical image data: Segmentation and geometry processing." (2007).
  5. Belongie, Serge, et al. "Color-and texture-based image segmentation using EM and its application to content-based image retrieval." Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271). IEEE, 1998.
  6. Pham . Dzung L. . Xu . Chenyang . Prince . Jerry L. . 2000 . Current Methods in Medical Image Segmentation . Annual Review of Biomedical Engineering . 2 . 315–337 . 11701515 . 10.1146/annurev.bioeng.2.1.315 .
  7. Forghani. M. . Forouzanfar . M.. Teshnehlab. M. . 2010 . Parameter optimization of improved fuzzy c-means clustering algorithm for brain MR image segmentation . Engineering Applications of Artificial Intelligence . 23 . 2 . 160–168 . 10.1016/j.engappai.2009.10.002 .
  8. Reznikov . Natalie . Buss . Dan J. . Provencher . Benjamin . McKee . Marc D. . Piché . Nicolas . October 2020 . Deep learning for 3D imaging and image analysis in biomineralization research . Journal of Structural Biology . 212 . 1 . 107598 . 10.1016/j.jsb.2020.107598 . 32783967 . 221126896 . 1047-8477.
  9. Brain tumor detection and segmentation in a CRF (Conditional random fields) framework with pixel-pairwise affinity and superpixel-level features . 10.1007/s11548-013-0922-7 . 2014 . Wu . Wei . Chen . Albert Y. C. . Zhao . Liang . Corso . Jason J. . International Journal of Computer Assisted Radiology and Surgery . 9 . 2 . 241–253 . 23860630 . 13474403 .
  10. E. B. George and M. Karnan (2012): "MR Brain image segmentation using Bacteria Foraging Optimization Algorithm", International Journal of Engineering and Technology, Vol. 4.
  11. Ye . Run Zhou . Noll . Christophe . Richard . Gabriel . Lepage . Martin . Turcotte . Éric E. . Carpentier . André C. . February 2022 . DeepImageTranslator: A free, user-friendly graphical interface for image translation using deep-learning and its applications in 3D CT image analysis . SLAS Technology . 27 . 1 . 76–84 . 10.1016/j.slast.2021.10.014 . 35058205 . 2472-6303. free .
  12. Ye . En Zhou . Ye . En Hui . Bouthillier . Maxime . Ye . Run Zhou . 2022-02-18 . DeepImageTranslator V2: analysis of multimodal medical images using semantic segmentation maps generated through deep learning . en . 10.1101/2021.10.12.464160v2 . 10.1101/2021.10.12.464160 . 239012446.
  13. Kamalakannan. Sridharan. Gururajan. Arunkumar. Sari-Sarraf. Hamed. Rodney. Long. Antani. Sameer. Double-Edge Detection of Radiographic Lumbar Vertebrae Images Using Pressurized Open DGVF Snakes. IEEE Transactions on Biomedical Engineering. 17 February 2010. 57. 6. 1325–1334. 10.1109/tbme.2010.2040082. 20172792. 12766600.
  14. Georgescu . Mariana-Iuliana . Ionescu . Radu Tudor . Miron . Andreea-Iuliana . 2022-12-21 . Diversity-Promoting Ensemble for Medical Image Segmentation . eess.IV . 2210.12388 .
  15. J. A. Delmerico, P. David and J. J. Corso (2011): "Building façade detection, segmentation and parameter estimation for mobile robot localization and guidance", International Conference on Intelligent Robots and Systems, pp. 1632–1639.
  16. Guo. Dazhou. Pei. Yanting. Zheng. Kang. Yu. Hongkai. Lu. Yuhang. Wang. Song. 2020. Degraded Image Semantic Segmentation With Dense-Gram Networks. IEEE Transactions on Image Processing. 29. 782–795. 10.1109/TIP.2019.2936111. 31449020. 2020ITIP...29..782G. 201753511. 1057-7149. free.
  17. Yi. Jingru. Wu. Pengxiang. Jiang. Menglin. Huang. Qiaoying. Hoeppner. Daniel J.. Metaxas. Dimitris N.. July 2019. Attentive neural cell instance segmentation. Medical Image Analysis. en. 55. 228–240. 10.1016/j.media.2019.05.004. 31103790. 159038604. free.
  18. Alexander Kirillov . Kaiming He . Ross Girshick . Carsten Rother . Piotr Dollár . Panoptic Segmentation. 1801.00868. cs.CV. 2018.
  19. Batenburg . K J. . Sijbers . J. . 2009. Adaptive thresholding of tomograms by projection distance minimization . Pattern Recognition . 42 . 10 . 2297–2305 . 10.1016/j.patcog.2008.11.027 . 2009PatRe..42.2297B . 10.1.1.182.8483 .
  20. K J. . Batenburg . J. . Sijbers . Optimal Threshold Selection for Tomogram Segmentation by Projection Distance Minimization . IEEE Transactions on Medical Imaging . 28 . 5 . 676–686 . June 2009 . PDF . 10.1109/tmi.2008.2010437 . 19272989 . 10994501 . 2012-07-31 . https://web.archive.org/web/20130503171943/http://www.visielab.ua.ac.be/publications/optimal-threshold-selection-tomogram-segmentation-projection-distance-minimization . 2013-05-03 .
  21. Book: A. . Kashanipour . N . Milani . A. . Kashanipour . H. . Eghrary . 2008 Congress on Image and Signal Processing . Robust Color Classification Using Fuzzy Rule-Based Particle Swarm Optimization . IEEE Congress on Image and Signal Processing . 2 . 110–114 . May 2008 . 10.1109/CISP.2008.770 . 978-0-7695-3119-9 . 8422475 .
  22. Barghout . Lauren . Sheynin . Jacob . 2013 . Real-world scene perception and perceptual organization: Lessons from Computer Vision . Journal of Vision . 13 . 9. 709 . 10.1167/13.9.709. free .
  23. Hossein Mobahi . Shankar Rao . Allen Yang . Shankar Sastry . Yi Ma. . Segmentation of Natural Images by Texture and Boundary Compression . International Journal of Computer Vision . 95 . 86–98 . 2011 . 10.1007/s11263-011-0444-0 . 1006.3679 . 10.1.1.180.3579 . 11070572 . 2011-05-08 . https://web.archive.org/web/20170808173212/http://perception.csl.illinois.edu/coding//papers/MobahiH2011-IJCV.pdf . 2017-08-08 .
  24. Shankar Rao, Hossein Mobahi, Allen Yang, Shankar Sastry and Yi Ma Natural Image Segmentation with Adaptive Texture and Boundary Encoding, Proceedings of the Asian Conference on Computer Vision (ACCV) 2009, H. Zha, R.-i. Taniguchi, and S. Maybank (Eds.), Part I, LNCS 5994, pp. 135–146, Springer.
  25. Ohlander . Ron . Price . Keith . Reddy . D. Raj . 1978 . Picture Segmentation Using a Recursive Region Splitting Method . Computer Graphics and Image Processing . 8 . 3. 313–333 . 10.1016/0146-664X(78)90060-6 .
  26. [R. Kimmel and A.M. Bruckstein.]
  27. [R. Kimmel]
  28. Barghout, Lauren. Visual Taxometric approach Image Segmentation using Fuzzy-Spatial Taxon Cut Yields Contextually Relevant Regions. Communications in Computer and Information Science (CCIS). Springer-Verlag. 2014
  29. Witold Pedrycz (Editor), Andrzej Skowron (Co-Editor), Vladik Kreinovich (Co-Editor). Handbook of Granular Computing. Wiley 2008
  30. Barghout, Lauren (2014). Vision. Global Conceptual Context Changes Local Contrast Processing (Ph.D. Dissertation 2003). Updated to include Computer Vision Techniques. Scholars' Press. .
  31. Barghout, Lauren, and Lawrence Lee. "Perceptual information processing system." Google Patents
  32. Lindeberg . T. . Li . M.-X. . 1997 . Segmentation and classification of edges using minimum description length approximation and complementary junction cues . Computer Vision and Image Understanding . 67 . 1. 88–98 . 10.1006/cviu.1996.0510.
  33. Digital Image Processing (2007, Pearson) by Rafael C. Gonzalez, Richard E. Woods
  34. Digital Image Processing (2007, Pearson) by Rafael C. Gonzalez, Richard E. Woods
  35. http://gth.krammerbuch.at/sites/default/files/articles/AHAH%20callback/01_Guberman_KORR.pdf
  36. R. Nock and F. Nielsen, Statistical Region Merging, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 26, No 11, pp 1452–1458, 2004.
  37. L. Chen, H. D. Cheng, and J. Zhang, Fuzzy subfiber and its application to seismic lithology classification, Information Sciences: Applications, Vol 1, No 2, pp 77–95, 1994.
  38. S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Directed Split and Merge Procedure, Proc. ICPR, 1974, Denmark, pp. 424–433.
  39. S.L. Horowitz and T. Pavlidis, Picture Segmentation by a Tree Traversal Algorithm, Journal of the ACM, 23 (1976), pp. 368–388.
  40. L. Chen, The lambda-connected segmentation and the optimal algorithm for split-and-merge segmentation, Chinese J. Computers, 14(1991), pp 321–331
  41. Caselles . V. . Kimmel . R. . Sapiro . G. . 1997 . Geodesic active contours . International Journal of Computer Vision . 22 . 1. 61–79 . 10.1023/A:1007979827043 . 406088 .
  42. Dervieux, A. and Thomasset, F. 1979. A finite element method for the simulation of Raleigh-Taylor instability. Springer Lect. Notes in Math., 771:145–158.
  43. Dervieux, A. and Thomasset, F. 1981. Multifluid incompressible flows by a finite element method. Lecture Notes in Physics, 11:158–163.
  44. Osher. Stanley. Sethian. James A. Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics. 79. 1. 1988. 12–49. 0021-9991. 10.1016/0021-9991(88)90002-2. 1988JCoPh..79...12O. 10.1.1.46.1266.
  45. S. Osher and N. Paragios.Geometric Level Set Methods in Imaging Vision and Graphics, Springer Verlag,, 2003.
  46. Web site: Segmentation in Medical Imaging. James A. Sethian. 15 January 2012.
  47. Chan . T.F. . Vese . L. . Luminița Vese . 2001 . Active contours without edges . IEEE Transactions on Image Processing . 10 . 2. 266–277 . 10.1109/83.902291. 18249617 . 2001ITIP...10..266C . 7602622 .
  48. [David Mumford]
  49. Jianbo Shi and Jitendra Malik (2000): "Normalized Cuts and Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 888–905, Vol. 22, No. 8
  50. Leo Grady (2006): "Random Walks for Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1768–1783, Vol. 28, No. 11
  51. Z. Wu and R. Leahy (1993): [ftp://sipi.usc.edu/pub/leahy/pdfs/MAP93.pdf "An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation"], IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1101–1113, Vol. 15, No. 11
  52. Leo Grady and Eric L. Schwartz (2006): "Isoperimetric Graph Partitioning for Image Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 469–475, Vol. 28, No. 3
  53. C. T. Zahn (1971): "Graph-theoretical methods for detecting and describing gestalt clusters", IEEE Transactions on Computers, pp. 68–86, Vol. 20, No. 1
  54. S. Geman and D. Geman (1984): "Stochastic relaxation, Gibbs Distributions and Bayesian Restoration of Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 721–741, Vol. 6, No. 6.
  55. A. Bouman and M. Shapiro (2002): "A multiscale Random field model for Bayesian image segmentation", IEEE Transactions on Image Processing, pp. 162–177, Vol. 3.
  56. J. Liu and Y. H. Yang (1994): "Multiresolution color image segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 689–700, Vol. 16.
  57. S. Vicente, V. Kolmogorov and C. Rother (2008): "Graph cut based image segmentation with connectivity priors", CVPR
  58. Corso, Z. Tu, and A. Yuille (2008): "MRF Labelling with Graph-Shifts Algorithm", Proceedings of International workshop on combinatorial Image Analysis
  59. B. J. Frey and D. MacKayan (1997): "A Revolution: Belief propagation in Graphs with Cycles", Proceedings of Neural Information Processing Systems (NIPS)
  60. Staib. L.H.. Duncan. J.S.. Boundary finding with parametrically deformable models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 14. 11. 1992. 1061–1075. 0162-8828. 10.1109/34.166621.
  61. Witkin, A. P. "Scale-space filtering", Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany,1019–1022, 1983.
  62. A. Witkin, "Scale-space filtering: A new approach to multi-scale description," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), vol. 9, San Diego, CA, March 1984, pp. 150–153.
  63. Koenderink, Jan "The structure of images", Biological Cybernetics, 50:363–370, 1984
  64. http://portal.acm.org/citation.cfm?id=80964&dl=GUIDE&coll=GUIDE Lifshitz, L. and Pizer, S.: A multiresolution hierarchical approach to image segmentation based on intensity extrema, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:6, 529–540, 1990.
  65. http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A472969&dswid=2693 Lindeberg, T.: Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention, International Journal of Computer Vision, 11(3), 283–318, 1993.
  66. http://www.csc.kth.se/~tony/book.html Lindeberg, Tony, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994
  67. http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=628490 Gauch, J. and Pizer, S.: Multiresolution analysis of ridges and valleys in grey-scale images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:6 (June 1993), pages: 635–646, 1993.
  68. Olsen, O. and Nielsen, M.: Multi-scale gradient magnitude watershed segmentation, Proc. of ICIAP 97, Florence, Italy, Lecture Notes in Computer Science, pages 6–13. Springer Verlag, September 1997.
  69. Dam, E., Johansen, P., Olsen, O. Thomsen,, A. Darvann, T., Dobrzenieck, A., Hermann, N., Kitai, N., Kreiborg, S., Larsen, P., Nielsen, M.: "Interactive multi-scale segmentation in clinical use" in European Congress of Radiology 2000.
  70. 10.1109/34.574787 . Probabilistic multiscale image segmentation . 1997 . Vincken . K.L. . Koster . A.S.E. . Viergever . M.A. . IEEE Transactions on Pattern Analysis and Machine Intelligence . 19 . 2 . 109–120 .
  71. http://vision.ai.uiuc.edu/~msingh/segmen/seg/MSS.html M. Tabb and N. Ahuja, Unsupervised multiscale image segmentation by integrated edge and region detection, IEEE Transactions on Image Processing, Vol. 6, No. 5, 642–655, 1997.
  72. Book: https://doi.org/10.1007%2F978-3-642-12307-8_12 . 10.1007/978-3-642-12307-8_12 . From Ramp Discontinuities to Segmentation Tree . Computer Vision – ACCV 2009 . Lecture Notes in Computer Science . 2010 . Akbas . Emre . Ahuja . Narendra . 5994 . 123–134 . 978-3-642-12306-1 .
  73. http://kth.diva-portal.org/smash/record.jsf?pid=diva2%3A451266&dswid=-4540 C. Undeman and T. Lindeberg (2003) "Fully Automatic Segmentation of MRI Brain Images using Probabilistic Anisotropic Diffusion and Multi-Scale Watersheds", Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 641–656.
  74. Florack, L. and Kuijper, A.: The topological structure of scale-space images, Journal of Mathematical Imaging and Vision, 12:1, 65–79, 2000.
  75. Bijaoui . A. . Rué . F. . 1995 . A Multiscale Vision Model . Signal Processing . 46 . 3. 345 . 10.1016/0165-1684(95)00093-4.
  76. Barghout, Lauren. Visual Taxometric Approach to Image Segmentation using Fuzzy-Spatial Taxon Cut Yields Contextually Relevant Regions. IPMU 2014, Part II. A. Laurent et al (Eds.) CCIS 443, pp 163–173. Springer International Publishing Switzerland
  77. Book: Barghout. Lauren. Vision: How Global Perceptual Context Changes Local Contrast Processing (Ph.D. Dissertation 2003). Updated to include Computer Vision Techniques. 2014. Scholars Press. 978-3-639-70962-9.
  78. [Mahinda Pathegama]
  79. Johnson. John L.. September 1994. Pulse-coupled neural nets: translation, rotation, scale, distortion, and intensity signal invariance for images. 10.1364/AO.33.006239. 20936043. OSA. 33. Applied Optics. 26. 6239–6253. 1994ApOpt..33.6239J.
  80. Ronneberger. Olaf. Fischer. Philipp. Brox. Thomas. U-Net: Convolutional Networks for Biomedical Image Segmentation. 1505.04597. 2015. cs.CV.
  81. Vicente . Sara . Rother . Carsten . Kolmogorov . Vladimir . CVPR 2011 . Object cosegmentation . IEEE . 2011 . 2217–2224 . 978-1-4577-0394-2 . 10.1109/cvpr.2011.5995530 .
  82. Saygin, ZM, Osher, DE, Augustinack, J, Fischl, B, and Gabrieli, JDE.:, Neuroimage, 56:3, pp. 1353–61, 2011.
  83. Menke, RA, Jbabdi, S, Miller, KL, Matthews, PM and Zarei, M.:, Neuroimage, 52:4, pp. 1175–80, 2010.