In graph theory a minimum spanning tree (MST)
T
G=(V,E)
|V|=n
|E|=m
G
MSTs are useful and versatile tools utilised in a wide variety of practical and theoretical fields. For example, a company looking to supply multiple stores with a certain product from a single warehouse might use an MST originating at the warehouse to calculate the shortest paths to each company store. In this case the stores and the warehouse are represented as vertices and the road connections between them - as edges. Each edge is labelled with the length of the corresponding road connection.
If
G
G
As finding MSTs is a widespread problem in graph theory, there exist many sequential algorithms for solving it. Among them are Prim's, Kruskal's and Borůvka's algorithms, each utilising different properties of MSTs. They all operate in a similar fashion - a subset of
E
This algorithm utilises the cut-property of MSTs. A simple high-level pseudocode implementation is provided below:
T\gets\emptyset
S\gets\{s\}
s
V
|V|-1
(u,v)
u\inS
v\in(V\setminusS)
S\getsS\cup\{v\}
T\getsT\cup\{(u,v)\}
Each edge is observed exactly twice - namely when examining each of its endpoints. Each vertex is examined exactly once for a total of
O(n+m)
O(1)
O(logn)
O(m+nlogn)
It is important to note that the loop is inherently sequential and can not be properly parallelised. This is the case, since the lightest edge with one endpoint in
S
V\setminusS
T
One possible idea is to use
O(n)
O(1)
O(n+m)
Kruskal's MST algorithm utilises the cycle property of MSTs. A high-level pseudocode representation is provided below.
T\gets
(u,v)\inE
u
v
T
T\getsT\cup\{(u,v)\}
The subtrees of
T
O(\alpha(m,n))
\alpha(m,n)
O(sort(n)+\alpha(n))
\alpha(n)
Similarly to Prim's algorithm there are components in Kruskal's approach that can not be parallelised in its classical variant. For example, determining whether or not two vertices are in the same subtree is difficult to parallelise, as two union operations might attempt to join the same subtrees at the same time. Really the only opportunity for parallelisation lies in the sorting step. As sorting is linear in the optimal case on
O(logn)
O(m\alpha(n))
Another approach would be to modify the original algorithm by growing
T
filterKruskal(
G
m<
G
E
(E\leq
E>)\gets
E
A\gets
E\leq
E>\gets
E>
A\getsA
\cup
E>
A
E
E\leq\gets\emptyset
E>\gets\emptyset
(u,v)\inE
u,v
\leq
E\leq\getsE\leq\cup{(u,v)}
E>\getsE>\cup{(u,v)}
E\leq
E>
E
Efiltered\gets\emptyset
(u,v)\inE
≠
Efiltered\getsEfiltered\cup{(u,v)}
Efiltered
Filter-Kruskal is better suited for parallelisation, since sorting, partitioning and filtering have intuitively easy parallelisations where the edges are simply divided between the cores.
The main idea behind Borůvka's algorithm is edge contraction. An edge
\{u,v\}
v
\{w,v\}\inE
\{w,u\}
T\gets\emptyset
|V|>1
S\gets\emptyset
v\inV
S\getsS
\cup
\{u,v\}\inE
\{u,v\}\inS
\{u,v\}
T\getsT\cupS
It is possible that contractions lead to multiple edges between a pair of vertices. The intuitive way of choosing the lightest of them is not possible in
O(m)
logn
O(mlogn)
One possible parallelisation of this algorithm[3] [4] [5] yields a polylogarithmic time complexity, i.e.
T(m,n,p) ⋅ p\inO(mlogn)
c
T(m,n,p)\inO(logcm)
T(m,n,p)
m
n
p
while
|V|>1
O( | m |
p |
+logn+logp)
O( | n |
p |
+logn)
O( | m |
p |
+logn)
The MST then consists of all the found lightest edges.
This parallelisation utilises the adjacency array graph representation for
G=(V,E)
\Gamma
n+1
\gamma
m
m
c
m
i
i
\gamma[\Gamma[i-1]]
\gamma[\Gamma[i]]
i
\Gamma
c[i]
i
\gamma
u
v
\Gamma[u]\leqi<\Gamma[u+1]
\gamma[i]=v
First the edges are distributed between each of the
p
i
\gamma[ | im |
p |
]
\gamma[ | (i+1)m |
p |
-1]
\gamma
pred
O(logn)
p
O( | n |
p |
+p)
Now each processor determines the lightest edge incident to each of its vertices.
v\gets
im | |
p |
\Gamma
e\gets
im | |
p |
;e<
(i+1)m | |
p |
-1;e++
\Gamma[v+1]=e
v++
c[e]<c[pred[v]]
pred[v]\getse
Here the issue arises some vertices are handled by more than one processor. A possible solution to this is that every processor has its own
prev
O(logp)
O( | m |
p |
+logn+logp)
Observe the graph that consists solely of edges collected in the previous step. These edges are directed away from the vertex to which they are the lightest incident edge. The resulting graph decomposes into multiple weakly connected components. The goal of this step is to assign to each vertex the component of which it is a part. Note that every vertex has exactly one outgoing edge and therefore each component is a pseudotree - a tree with a single extra edge that runs in parallel to the lightest edge in the component but in the opposite direction. The following code mutates this extra edge into a loop:
parallel forAll
v\inV
w\getspred[v]
pred[w]=v\landv<w
pred[v]\getsv
Now every weakly connected component is a directed tree where the root has a loop. This root is chosen as the representative of each component. The following code uses doubling to assign each vertex its representative:
while
\existsv\inV:pred[v] ≠ pred[pred[v]]
v\inV
pred[v]\getspred[pred[v]]
Now every subgraph is a star. With some advanced techniques this step needs
O( | n |
p |
+logn)
In this step each subgraph is contracted to a single vertex.
k\gets
V'\gets\{0,...,k-1\}
f:
→ \{0,...,k-1\}
E'\gets\{(f(pred[v]),f(pred[w]),c,eold):(v,w)\inE\landpred[v] ≠ pred[w]\}
Finding the bijective function is possible in
O( | n |
p |
+logp)
E'
O( | m |
p |
+logp)
Each iteration now needs
O( | m |
p |
+logn)
logn
O(logn(
m | |
p |
+logn))
m\in\Omega(plog2p)
\Theta(1)
m\inO(n)
There are multiple other parallel algorithms that deal the issue of finding an MST. With a linear number of processors it is possible to achieve this in
O(logn)
Another challenge is the External Memory model - there is a proposed algorithm due to Dementiev et al. that is claimed to be only two to five times slower than an algorithm that only makes use of internal memory[6]