The generalized distributive law (GDL) is a generalization of the distributive property which gives rise to a general message passing algorithm.[1] It is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence communities. The law and algorithm were introduced in a semi-tutorial by Srinivas M. Aji and Robert J. McEliece with the same title.
"The distributive law in mathematics is the law relating the operations of multiplication and addition, stated symbolically,
a*(b+c)=a*b+a*c
a
b+c
a*b+a*c
As it can be observed from the definition, application of distributive law to an arithmetic expression reduces the number of operations in it. In the previous example the total number of operations reduced from three (two multiplications and an addition in
a*b+a*c
a*(b+c)
This is explained in a more formal way in the example below:
\alpha(a,b)\stackrel{def
f( ⋅ )
g( ⋅ )
a,b,c,d,e\inA
|A|=q
Here we are "marginalizing out" the independent variables (
c
d
e
q2
(a,b)
q3
(c,d,e)
\alpha(a,b)
2 ⋅ q2 ⋅ q3=2q5
O(n5)
If we apply the distributive law to the RHS of the equation, we get the following:
\alpha(a,b)\stackrel{def
This implies that
\alpha(a,b)
\alpha1(a,b) ⋅ \alpha2(a)
\alpha1(a,b)\stackrel{def
\alpha2(a)\stackrel{def
Now, when we are calculating the computational complexity, we can see that there are
q3
\alpha1(a,b)
\alpha2(a)
q2
\alpha1(a,b) ⋅ \alpha2(a)
\alpha(a,b)
q3+q3+q2=2q3+q2
\alpha(a,b)
O(n3)
O(n5)
Some of the problems that used distributive law to solve can be grouped as follows
1. Decoding algorithms
A GDL like algorithm was used by Gallager's for decoding low density parity-check codes. Based on Gallager's work Tanner introduced the Tanner graph and expressed Gallagers work in message passing form. The tanners graph also helped explain the Viterbi algorithm.
It is observed by Forney that Viterbi's maximum likelihood decoding of convolutional codes also used algorithms of GDL-like generality.
2. Forward-backward algorithm
The forward backward algorithm helped as an algorithm for tracking the states in the Markov chain. And this also was used the algorithm of GDL like generality
3. Artificial intelligence
The notion of junction trees has been used to solve many problems in AI. Also the concept of bucket elimination used many of the concepts.
MPF or marginalize a product function is a general computational problem which as special case includes many classical problems such as computation of discrete Hadamard transform, maximum likelihood decoding of a linear code over a memory-less channel, and matrix chain multiplication. The power of the GDL lies in the fact that it applies to situations in which additions and multiplications are generalized.A commutative semiring is a good framework for explaining this behavior. It is defined over a set
K
+
.
(K,+)
(K,.)
Let
p1,\ldots,pn
p1\inA1,\ldots,pn\inAn
A
|Ai|=qi
i=1,\ldots,n
S=\{i1,\ldots,ir\}
S\subset\{1,\ldots,n\}
AS=
A | |
i1 |
x … x
A | |
ir |
pS=
(p | |
i1 |
,\ldots,
p | |
ir |
)
qS=|AS|
A=A1 x … x An
p=\{p1,\ldots,pn\}
Let
S=\{Sj
M | |
\} | |
j=1 |
Sj\subset\{1,...,n\}
\alphai:
A | |
Si |
→ R
R
p | |
Si |
\alphai
Now the global kernel
\beta:A → R
\beta(p1,...,pn)=
M | |
\prod | |
i=1 |
\alpha(p | |
Si |
)
Definition of MPF problem: For one or more indices
i=1,...,M
Si
\beta
\betai
:A | |
Si |
→ R
\betai
(p | |
Si |
)=
\displaystyle\sum\limits | |||||||||||||||||||||||
|
\beta(p)
Here
c | |
S | |
i |
Si
\{1,...,n\}
\betai(p
Si |
)
ith
Si
ith
Mq1q2q3 … qn
q1q2 … qn
(M-1)q1q2...qn
ith
The following is an example of the MPF problem. Let
p1,p2,p3,p4,
p5
p1\inA1,p2\inA2,p3\inA3,p4\inA4,
p5\inA5
M=4
S=\{\{1,2,5\},\{2,4\},\{1,4\},\{2\}\}
f(p1,p2,p5)
g(p3,p4)
\alpha(p1,p4)
\beta(p2)
\alpha(p1,p4)=
\displaystyle\sum\limits | |
p2\inA2,p3\inA3,p5\inA5 |
f(p1,p2,p5) ⋅ g(p2,p4)
\beta(p2)=
\sum\limits | |
p1\inA1,p3\inA3,p4\inA4,p5\inA5 |
f(p1,p2,p5) ⋅ g(p2,p4)
Here local domains and local kernels are defined as follows:
local domains | local kernels | |
---|---|---|
\{p1,p2,p5\} | (f(p1,p2,p5) | |
\{p2,p4\} | g(p2,p4) | |
\{p1,p4\} | 1 | |
\{p2\} | 1 |
where
\alpha(p1,p4)
3rd
\beta(p2)
4th
Consider another example where
p1,p2,p3,p4,r1,r2,r3,r4\in\{0,1\}
f(r1,r2,r3,r4)
local domains | local kernels | ||||
---|---|---|---|---|---|
\{r1,r2,r3,r4\} | f(r1,r2,r3,r4) | ||||
\{p1,r1\} |
| ||||
\{p2,r2\} |
| ||||
\{p3,r3\} |
| ||||
\{p4,r4\} |
| ||||
\{p1,p2,p3,p4\} | 1 |
Now since the global kernel is defined as the product of the local kernels, it is
F(p1,p2,p3,p4,r1,r2,r3,r4)=f(p1,p2,p3,p
p1r1+p2r2+p3r3+p4r4 | |
4) ⋅ (-1) |
and the objective function at the local domain
p1,p2,p3,p4
F(p1,p2,p3,p4)=\displaystyle\sum
\limits | |
r1,r2,r3,r4 |
f(r1,r2,r3,r4)
p1r1+p2r2+p3r3+p4r4 | |
⋅ (-1) |
.
This is the Hadamard transform of the function
f( ⋅ )
If one can find a relationship among the elements of a given set
S
S
T
vi
vj
i ≠ j
Si\capSj
vi
vj
For example,
Example 1: Consider the following nine local domains:
\{p2\}
\{p3,p2\}
\{p2,p1\}
\{p3,p4\}
\{p3\}
\{p1,p4\}
\{p1\}
\{p4\}
\{p2,p4\}
For the above given set of local domains, one can organize them into a junction tree as shown below:
Similarly If another set like the following is given
Example 2: Consider the following four local domains:
\{p1,p2\}
\{p2,p3\}
\{p3,p4\}
\{p1,p4\}
Then constructing the tree only with these local domains is not possible since this set of values has no common domains which can be placed between any two values of the above set. But however, if add the two dummy domains as shown below then organizing the updated set into a junction tree would be possible and easy too.
5.
\{p1,p2
p4\}
\{p2,p3
p4\}
Similarly for these set of domains, the junction tree looks like shown below:
Input: A set of local domains.
Output: For the given set of domains, possible minimum number of operations that is required to solve the problem is computed.
So, if
vi
vj
vi
vj
\mui,j
A | |
Si\capSj |
→ R
i
j
\mui,j
1
\mui,j
(p | |
Si\capSj |
)
\sum | |||||||||||
|
\alphai
(p | |
Si |
)
\prod | |
{vk\operatorname{adj |
vi},{k ≠ j}}\muk,j
(p | |
Sk\capSi |
)(1)
vk\operatorname{adj}vi
vk
vi
Similarly each vertex has a state which is defined as a table containing the values from the function
\sigmai:
A | |
Si |
→ R
vi
\alpha(p | |
Si |
)
\sigmai
\sigma(p | |
Si |
)=\alphai(p
Si |
)
\prod | |
vk\operatorname{adj |
vi}\muk,j
(p | |
Sk\capSi |
)(2).
For the given set of local domains as input, we find out if we can create a junction tree, either by using the set directly or by adding dummy domains to the set first and then creating the junction tree, if construction junction is not possible then algorithm output that there is no way to reduce the number of steps to compute the given equation problem, but once we have junction tree, algorithm will have to schedule messages and compute states, by doing these we can know where steps can be reduced, hence will be discusses this below.
There are two special cases we are going to talk about here namely Single Vertex Problem in which the objective function is computed at only one vertex
v0
Lets begin with the single-vertex problem, GDL will start by directing each edge towards the targeted vertex
v0
v0
v0
v0
For Example, let us consider a junction tree constructed from the set of local domains given above i.e. the set from example 1,
Now the Scheduling table for these domains is (where the target vertex is
p2
RoundMessageorStateComputation
1.\mu8,4(p4)=\alpha8(p4)
2.\mu8,4(p4)=
\Sigma | |
p2 |
\alpha9(p2,p4)
3.\mu5,2(p3)=\alpha5(p3)
4.\mu6,3(p1)=
\Sigma | |
p4 |
\alpha6(p1,p4)
5.\mu7,3(p1)=\alpha7(p1)
6.\mu4,2(p3)=
\Sigma | |
p4 |
\alpha4(p3,p4).\mu8,4(p4).\mu9,4(p4)
7.\mu3,1(p2)=
\Sigma | |
p1 |
\alpha3(p2,p1).\mu6,3(p1).\mu7,3(p1)
8.\mu2,1(p2)=
\Sigma | |
p3 |
\alpha2(p3,p2).\mu4,2(p3).\mu5,2(p3)
9.\sigma1(p2)=\alpha1(p2).\mu2,1(p2).\mu3,1(p2)
Thus the complexity for Single Vertex GDL can be shown as
\Sigmav
d(v)|A | |
S(v) |
|
S(v)
v
d(v)
v
To solve the All-Vertices problem, we can schedule GDL in several ways, some of them are parallel implementation where in each round, every state is updated and every message is computed and transmitted at the same time. In this type of implementation the states and messages will stabilizes after number of rounds that is at most equal to the diameter of the tree. At this point all the all states of the vertices will be equal to the desired objective function.
Another way to schedule GDL for this problem is serial implementation where its similar to the Single vertex problem except that we don't stop the algorithm until all the vertices of a required set have not got all the messages from all their neighbors and have compute their state.
Thus the number of arithmetic this implementation requires is at most
\Sigmav
d(v)|A | |
S(v) |
|
The key to constructing a junction tree lies in the local domain graph
GLD
M
v1,v2,v3,\ldots,vM
ei,j:vi\leftrightarrowvj
\omegai,j=|Si\capSj|
xk\inSi\capSj
xk
ei,j
\omegamax
GLD
\omega*=\Sigma
M | |
i=1 |
|Si|-n
where n is the number of elements in that set. For more clarity and details, please refer to these.[3] [4]
Let
'T'
'V'
'E'
'E'
E=\{(1,2),(2,1),(1,3),(3,1),(4,2),(2,4),(5,2),(2,5),(6,3),(3,6),(7,3),(3,7),(8,4),(4,8),(9,4),(4,9)\}
NOTE:
E
The schedule for the GDL is defined as a finite sequence of subsets of
E
l{E}=
EN
Nth
Having defined/seen some notations, we will see want the theorem says,When we are given a schedule
l{E}=\{E1,E2,E3,\ldots,EN\}
V x \{0,1,2,3,\ldots,N\}
vi(t)
t\in\{0,1,2,3,\ldots,N\}
vj
jth
\sigma(p | |
Si |
)=\alphai(p
Si |
)
\prod | |
vk\operatorname{adj |
vi}\muk,j
(p | |
Sk\capSi |
)
and iff there is a path from
vi(0)
vj(N)
Here we try to explain the complexity of solving the MPF problem in terms of the number of mathematical operations required for the calculation. i.e. We compare the number of operations required when calculated using the normal method (Here by normal method we mean by methods that do not use message passing or junction trees in short methods that do not use the concepts of GDL)and the number of operations using the generalized distributive law.
Example: Consider the simplest case where we need to compute the following expression
ab+ac
To evaluate this expression naively requires two multiplications and one addition. The expression when expressed using the distributive law can be written as
a(b+c)
Similar to the above explained example we will be expressing the equations in different forms to perform as few operation as possible by applying the GDL.
As explained in the previous sections we solve the problem by using the concept of the junction trees. The optimization obtained by the use of these trees is comparable to the optimization obtained by solving a semi group problem on trees. For example, to find the minimum of a group of numbers we can observe that if we have a tree and the elements are all at the bottom of the tree, then we can compare the minimum of two items in parallel and the resultant minimum will be written to the parent. When this process is propagated up the tree the minimum of the group of elements will be found at the root.
The following is the complexity for solving the junction tree using message passing
We rewrite the formula used earlier to the following form. This is the eqn for a message to be sent from vertex v to w
\muv,w(pv)=\sum
pv\inAS(v) |
\alphav(pv)\prod
uadjvu |
\muu,v(pu)
\sigmav(pv)=\alphav(pv)\produv}\muv,w(pv)
We first will analyze for the single-vertex problem and assume the target vertex is
v0
v
v0
(v,w)
pu
qv-1
additions and
qv(d(v)-1)
multiplications.
(We represent the
|AS(v)|
qv
But there will be many possibilities for
xv
qv\stackrel{def
pv
(qv)(qv-1)=qv-qv
additions and
(qv)qv.(d(v)-1)=(d(v)-1)qv
The total number of arithmetic operations required to send a message towards
v0
\sum(qv-qv)
additions and
\sum(d(v)-1)qv
multiplications.
Once all the messages have been transmitted the algorithm terminates with the computation of state at
v0
d(v0)q0
\sum
v ≠ v0 |
(qv-qv)
additions and
\sum
v ≠ v0 |
(d(v)-1)qv+d(v0)q
v0 |
multiplications
Thus the grand total of the number of calculations is
\chi(T)=\sumvd(v)qv-\sumeqe
(1)
where
e=(v,w)
qv
The formula above gives us the upper bound.
If we define the complexity of the edge
e=(v,w)
\chi(e)=qv+qw-qv
Therefore,
(1)
\chi(T)=\sume\chi(e)
We now calculate the edge complexity for the problem defined in Figure 1 as follows
\chi(1,2)=q2+q2q3-q2
\chi(2,4)=q3q4+q2q3-q3
\chi(2,5)=q3+q2q3-q3
\chi(4,8)=q4+q3q4-q4
\chi(4,9)=q2q4+q3q4-q4
\chi(1,3)=q2+q2q1-q2
\chi(3,7)=q1+q1q2-q1
\chi(3,6)=q1q4+q1q2-q1
The total complexity will be
3q2q3+3q3q4+3q1q2+q2q4+q1q4-q1-q3-q4
Now we consider the all-vertex problem where the message will have to be sent in both the directions and state must be computed at both the vertexes. This would take
O(\sumvd(v)d(v)qv)
3(d-2)
d
(a1,\ldots,ad)
d
d-1
ai
3(d-2)
d(d-2)
b1=a1,b2=b1 ⋅ a2=a1 ⋅ a2,bd-1=bd-2 ⋅ ad-1=a1a2 … ad-1
cd=ad,cd-1=ad-1cd=ad-1 ⋅ ad,\ldots,c2=a2 ⋅ c3=a2a3 … ad
2(d-2)
mj
ai
aj
m1=c2,m2=b1 ⋅ c3
d-2
3(d-2)
There is not much we can do when it comes to the construction of the junction tree except that we may have many maximal weight spanning tree and we should choose the spanning tree with the least
\chi(T)
It may seem that GDL is correct only when the local domains can be expressed as a junction tree. But even in cases where there are cycles and a number of iterations the messages will approximately be equal to the objective function. The experiments on Gallager–Tanner–Wiberg algorithm for low density parity-check codes were supportive of this claim.