Graph Aligner Explained

Graph Aligner (GRAAL)^[1] is an algorithm for global network alignment that is based solely on network topology. It aligns two networks

and

by producing an alignment that consists of a set of ordered pairs

(x,y)

, where

is a node in

and

is a node in

. GRAAL matches pairs of nodes originating in different networks based on their graphlet degree signature similarities,^[2] where a higher similarity between two nodes corresponds to a higher topological similarity between their extended neighborhoods (out to distance 4). GRAAL produces global alignments, i.e., it aligns each node in the smaller network to exactly one node in the larger network. The matching proceeds using a technique analogous to the "seed and extend" approach of the popular BLAST algorithm for sequence alignment: it first chooses a single "seed" pair of nodes (one node from each network) with high graphlet degree signature similarity. It then expands the alignment radially outward around the seed as far as practical using a greedy algorithm (see [Kuchaiev et al., 2010] for details).

Method

When aligning two graphs

G(V,E)

and

H(U,F)

, GRAAL first computes costs of aligning each node

in G with each node

. The cost of aligning two nodes takes into account the graphlet degree signature similarity between them, modified to reduce the cost as the degrees of both nodes increase, since higher degree nodes with similar signatures provide a tighter constraint than correspondingly similar low degree nodes. In this way, GRAAL align the densest parts of the networks first. Let

deg(v)

be the degree of a node

in network

, let

max_deg(G)

be the maximum degree of nodes in

, let

S(v,u)

be the graphlet degree signature similarity of nodes

and

, and let

\alpha

be a parameter in [0, 1] that controls the contribution of the node signature similarity to the cost function (that is,

1-\alpha

is the parameter that controls the contribution of node degrees to the cost function), then the cost of aligning nodes

and

is computed as:

C(v,u)=2-((1-\alpha) x

	deg(v)+deg(u)
	max\_deg(G)+max\_deg(H)

+\alpha x S(v,u))

.
A cost of

corresponds to a pair of topologically identical nodes

and

, while a cost close to

corresponds to a pair of topologically different nodes.

GRAAL chooses as the initial seed a pair of nodes

(v,u)

v\inV

and

u\inU

, that have the smallest cost. Ties are broken randomly. Once the seed is found, GRAAL builds "spheres" of all possible radii around nodes

and

. A sphere of radius

around node

is the set of nodes

S_G(v,r)=\{x\inV:d(v,x)=r\}

that are at distance

from

, where the distance

d(v,x)

is the length of the shortest path from

. Spheres of the same radius in two networks are then greedily aligned together by searching for the pairs

(v',u'):v'\inS_G(v,r)

and

u'\inS_H(u,r)

that are not already aligned and that can be aligned with the minimal cost. When all spheres around the seed

(v,u)

have been aligned, some nodes in both networks may remain unaligned. For this reason, GRAAL repeats the same algorithm on a pair of networks

(G^p,H^p)

for

p=1,2,

and

, and searches for the new seed again, if necessary. Network

G^p

is defined as a new network

G^p=(V,E^p)

having the same set of nodes as

and having

(v,x)\inE^p

if and only if the distance between nodes

and

is less than or equal to

, i.e.,

d_G(v,x)\leqp

. Note that

G^1=G

. Using

G^p

p>1

, allows us to align a path of length

in one network to a single edge in another network, which is analogous to allowing "insertions" or "deletions" in a sequence alignment. GRAAL stops when each node from

is aligned to exactly one node in

Application

GRAAL was used to align two protein-protein interaction (PPI) networks and predict biological function of unannotated proteins based on the function of their annotated aligned partners. Also, GRAAL was used to compute a pairwise all-to-all network similarity matrix between metabolic networks of a group of species and then build their phylogenetic tree. All of this was achieved using solely network topological information.

Notes and References

Oleksii Kuchaiev, Tijana Milenković, Vesna Memisević, Wayne Hayes, and Nataša Pržulj, Topological network alignment uncovers biological function and phylogeny, Journal of the Royal Society Interface 2010.
Tijana Milenkovic and Nataša Pržulj, Uncovering Biological Network Function via Graphlet Degree Signatures, Cancer Informatics 2008, 6:257–273.