MRF optimization via dual decomposition explained

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization.^[1] Dual decomposition is applied to markov logic programs as an inference technique.^[2]

Background

Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units.^[3] Consider a graph

G=(V,E)

with nodes

and Edges

. The goal is to assign a label

l_p

to each

p\inV

so that the MRF Energy is minimized:

(1)

min\Sigma_p\in\theta_p(l_p)+\Sigma_pq\in\theta_pq(l_p)(l_q)

Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation

(2)

min_xE(\theta,x)=\theta.x=\sum_p\theta_p.x_p+\sum_pq\theta_pq.x_pq

In many applications, the MRF-variables are -variables that satisfy:

x_p(l)=1

\Leftrightarrow

label

is assigned to

, while

x_pq(l,l^\prime)=1

, labels

l,l^\prime

are assigned to

p,q

Dual Decomposition

The main idea behind decomposition is surprisingly simple:

decompose your original complex problem into smaller solvable subproblems,
extract a solution by cleverly combining the solutions from these subproblems.

A sample problem to decompose:

min_x\Sigma_if^i(x)

where

x\inC

In this problem, separately minimizing every single

f^i(x)

over

is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables

\{x^i\}

and the problem will be as follows:

min
	\{x^i\

,x}\Sigma_if^i(xⁱ⁾

where

xⁱ\inC,x^i=x

Now we can relax the constraints by multipliers

\{λ^i\}

which gives us the following Lagrangian dual function:

	i\}) =min
g(\{λ
	\{xⁱ\inC\

,x}\Sigma_if^i(xⁱ⁾+\Sigma_iλ^i.(x

	i-x)=min

	\{xⁱ\inC\

,x}\Sigma_i[f^i(x^i)+λ^i.x

	i]-(\Sigma

	i

λ^i)x

Now we eliminate

from the dual function by minimizing over

and dual function becomes:

	i\})=min
g(\{λ
	\{xⁱ\inC\

}\Sigma_i[f^i(x^i) + \lambda^i.x^i]

We can set up a Lagrangian dual problem:

(3)

max
	\{λ^i\

\inΛ}

	i})=\Sigma
g({λ
	i

g^i(x^i),

The Master problem

(4)

g^i(x

	i)=min

	xⁱ

f^i(xⁱ⁾+λ^i.xⁱ

where

xⁱ\inC

The Slave problems

MRF optimization via Dual Decomposition

The original MRF optimization problem is NP-hard and we need to transform it into something easier.

\tau

is a set of sub-trees of graph

where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree

\tau

will be smaller. The vector of MRF parameters is

\theta^T

and the vector of MRF variables is

x^T

, these two are just smaller in comparison with original MRF vectors

\theta,x

. For all vectors

\theta^T

we'll have the following:

(5)

\sum_T

	T=
\theta
	p

\theta_p,\sum_T

	T=
\theta
	pq

\theta_pq.

Where

\tau(p)

and

\tau(pq)

denote all trees of

\tau

than contain node

and edge

respectively. We simply can write:

(6)

E(\theta,x)=\sum_TE(\theta^T,x^T)

And our constraints will be:

(7)

x^T\in\chi^T,

	T=x
x
	\|T

,\forallT\in\tau

Our original MRF problem will become:

(8)

min
	\{x^T\

,x}\Sigma_TE(\theta^T,x^T)

where

x^T\in\chi^T,\forallT\in\tau

and

x^T\inx_|T,\forallT\in\tau

And we'll have the dual problem we were seeking:

(9)

max
	\{λ^T\

\inΛ}g(\{λ^T\})=\sum_Tg^T(λ^T),

The Master problem

where each function

g^T(.)

is defined as:

(10)

g^T(λ

	T)=min

	x^T

E(\theta^T+λ^T,x^T)

where

x^T\in\chi^T

The Slave problems

Theoretical Properties

Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2).

min
	\{x^T\

,x}\{E(x,

	T
\theta)\|x
	p,x

\inCONVEXHULL(\chi^T)\}

Theorem 2. If the sequence of multipliers

\{\alpha_t\}

satisfies

\alpha_t\geq0,\lim_t\alpha_t=0,

	infin
\sum
	t=0

\alpha_t=infin

then the algorithm converges to the optimal solution of (9).

Theorem 3. The distance of the current solution

\{\theta^T\}

to the optimal solution

\{\bar\theta^T\}

, which decreases at every iteration.

Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition.

Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.

Notes and References

MRF Optimization via Dual Decomposition.
10.1109/icdm.2012.96 . 2012 . IEEE . Feng Niu and Ce Zhang and Christopher Re and Jude Shavlik . Scaling Inference for Markov Logic via Dual Decomposition . 2012 IEEE 12th International Conference on Data Mining . 10.1.1.244.8755 .
10.1109/btas.2013.6712721 . 2013 . IEEE . Shervin Rahimzadeh Arashloo and Josef Kittler . Efficient processing of MRFs for unconstrained-pose face recognition . 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS) .