f
C
x
f
C
f(x)
f
Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration.
Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, denoising, registration and stereo matching.
f:\{0,1\}n\toR
G=(V,E)
s
t
V0=\{v1,...,vn\}\subsetV-\{s,t\}
(x1,...,xn)\in\{0,1\}n
f(x1,...,xn)
C=(S,T)
G
vi\inS
xi=0
vi\inT
xi=1
It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions
f(x)=w0+\sumiwi(xi)+\sumiwij(xi,xj).
are representable if and only if they are submodular, i.e. for each quadratic term
wij
wij(0,0)+wij(1,1)\lewij(0,1)+wij(1,0).
Cubic functions
f(x)=w0+\sumiwi(xi)+\sumiwij(xi,xj)+\sumiwijk(xi,xj,xk)
are representable if and only if they are regular, i.e. all possible binary projections to two variables, obtained by fixing the value of the remaining variable, are submodular. For higher-order functions, regularity is a necessary condition for representability.
Graph construction for a representable function is simplified by the fact that the sum of two representable functions
f'
f''
G=(V'\cupV'',E'\cupE'')
G'=(V',E')
G''=(V'',E'')
The graph representing a quadratic function of
n
n+2
A unary term
wi
xi
vi
s → vi
wi(1)-wi(0)
wi(1)\gewi(0)
vi → t
wi(0)-wi(1)
wi(1)<wi(0)
A quadratic (or binary) term
wij
vi
vj
wij(xi,xj)=wij(0,0)+kixi+kjxj+kij\left((1-xi)xj+xi(1-xj)\right)
with
\begin{align} ki&=
1 | |
2 |
(wij(1,0)-wij(0,0))\\ kj&=
1 | |
2 |
(wij(1,1)-wij(1,0))\\ kij&=
1 | |
2 |
(wij(0,1)+wij(1,0)-wij(0,0)-wij(1,1)). \end{align}
In this expression, the first term is constant and it is not represented by any edge, the two following terms depend on one variable and are represented by one edge, as shown in the previous section for unary terms, while the third term is represented by an edge
vi → vj
wij(0,1)+wij(1,0)-wij(0,0)-wij(1,1)
A cubic (or ternary) term
wijk
vi
vj
vk
vijk
p=wijk(0,0,0)+wijk(0,1,1)+wijk(1,0,1)+wijk(1,1,0)
p>0
wijk(xi,xj,xk)= wijk(0,0,0) +p1(xi-1)+p2(xj-1)+p3(xk-1) +p23(xj-1)xk+p31xi(xk-1)+p12(xi-1)xj -pxixjxk
with
\begin{align} p1&=wijk(1,0,1)-wijk(0,0,1)\\ p2&=wijk(1,1,0)-wijk(1,0,1)\\ p3&=wijk(0,1,1)-wijk(0,1,0)\\ p23&=wijk(0,0,1)+wijk(0,1,0)-wijk(0,0,0)-wijk(0,1,1)\\ p31&=wijk(0,0,1)+wijk(1,0,0)-wijk(0,0,0)-wijk(1,0,1)\\ p12&=wijk(0,1,0)+wijk(1,0,0)-wijk(0,0,0)-wijk(1,1,0). \end{align}
If
p<0
p23
p31
p12
In this decomposition, the constant, unary and binary terms can be represented as shown in the previous sections. If
p>0
vi → vijk
vj → vijk
vk → vijk
vijk → t
p
p<0
vijk → vi
vijk → vj
vijk → vk
s → vijk
-p
After building a graph representing a pseudo-Boolean function, it is possible to compute a minimum cut using one among the various algorithms developed for flow networks, such as Ford–Fulkerson, Edmonds–Karp, and Boykov–Kolmogorov algorithm. The result is a partition of the graph in two connected components
S
T
s\inS
t\inT
xi=0
i
vi\in S
xi=1
i
vi\inT
Max-flow algorithms such as Boykov - Kolmogorov's are very efficient in practice for sequential computation, but they are difficult to parallelise, making them not suitable for distributed computing applications and preventing them from exploiting the potential of modern CPUs. Parallel max-flow algorithms were developed, such as push-relabel and jump-flood, that can also take advantage of hardware acceleration in GPGPU implementations.
The previous construction allows global optimization of pseudo-Boolean functions only, but it can be extended to quadratic functions of discrete variables with a finite number of values, in the form
f(x)=\sumiD(xi)+\sum(i,S(xi,xj)
where
E\subseteqV x V
xi\inΛ=\{1,...,k\}
D(xi)
S(xi,xj)
S(xi,xj)
Given a function
f:Λn\toR
Λ=\{1,...,k\}
x=(x1,...,xn)\inΛn
x
P=\{Pl|l\inΛ\}
Pl=\{xi|xi=l\inΛ\}
P
P'
\alpha\inΛ
P
P'
\alpha
P\alpha\subsetP'\alpha
P'l\subsetPl \foralll\inΛ-\{\alpha\}
\alpha
\beta
\alpha\beta
Pl=P'l \foralll\inΛ-\{\alpha,\beta\}
\alpha
x
\alpha
x
\alpha\beta
\alpha
\beta
x
For each iteration, the
\alpha
\alpha
\Alpha(x)
\alpha
x
x:=arbitraryvalueinΛn
exit:=0
exit\ne1
exit=1
\alpha\inΛ
\hat{x
f(\hat{x
x=\hat{x
exit:=0
The
\alpha\beta
\Alpha\Beta(x)
\alpha\beta
x
x:=arbitraryvalueinΛn
exit:=0
exit\ne1
exit=1
(\alpha,\beta)\inΛ2
\hat{x
f(\hat{x
x=\hat{x
exit:=0
In both cases, the optimization problem in the innermost loop can be solved exactly and efficiently with a graph cut. Both algorithms terminate certainly in a finite number of iterations of the outer loop, and in practice such number is small, with most of the improvement happening at the first iteration. The algorithms can generate different solutions depending on the initial guess, but in practice they are robust with respect to initialisation, and starting with a point where all variables are assigned to the same random value is usually sufficient to produce good quality results.
The solution generated by such algorithms is not necessarily a global optimum, but it has strong guarantees of optimality. If
S(xi,xj)
x
\alpha
S(xi,xj)
x
\alpha\beta
f(x)
f(x*)
f(x)\le2
max\alphaS(\alpha,\beta) | |
min\alphaS(\alpha,\beta) |
f(x*).
See also: Quadratic pseudo-Boolean optimization.
Generally speaking, the problem of optimizing a non-submodular pseudo-Boolean function is NP-hard and cannot be solved in polynomial time with a simple graph cut. The simplest approach is to approximate the function with a similar but submodular one, for instance truncating all non-submodular terms or replacing them with similar submodular expressions. Such approach is generally sub-optimal, and it produces acceptable results only if the number of non-submodular terms is relatively small.
In case of quadratic non-submodular functions, it is possible to compute in polynomial time a partial solution using algorithms such as QPBO. Higher-order functions can be reduced in polynomial time to a quadratic form that can be optimised with QPBO.
Quadratic functions are extensively studied and were characterised in detail, but more general results were derived also for higher-order functions. While quadratic functions can indeed model many problems of practical interest, they are limited by the fact they can represent only binary interactions between variables. The possibility to capture higher-order interactions allows to better capture the nature of the problem and it can provide higher quality results that could be difficult to achieve with quadratic models. For instance in computer vision applications, where each variable represents a pixel or voxel of the image, higher-order interactions can be used to model texture information, that would be difficult to capture using only quadratic functions.
Sufficient conditions analogous to submodularity were developed to characterise higher-order pseudo-Boolean functions that can be optimised in polynomial time, and there exists algorithms analogous to
\alpha
\alpha\beta