In graph theory, the Weisfeiler Leman graph isomorphism test is a heuristic test for the existence of an isomorphism between two graphs G and H. It is a generalization of the color refinement algorithm and has been first described by Weisfeiler and Leman in 1968.[1] The original formulation is based on graph canonization, a normal form for graphs, while there is also a combinatorial interpretation in the spirit of color refinement and a connection to logic.
There are several versions of the test (e.g. k-WL and k-FWL) referred to in the literature by various names, which easily leads to confusion. Additionally, Andrey Leman is spelled `Lehman' in several older articles.
All variants of color refinement are one-sided heuristics that take as input two graphs G and H and output a certificate that they are different or 'I don't know'. This means that if the heuristic is able to tell G and H apart, then they are definitely different, but the other direction does not hold: for every variant of the WL-test (see below) there are non-isomorphic graphs where the difference is not detected. Those graphs are highly symmetric graphs such as regular graphs for 1-WL/color refinement.
The 1-dimensional graph isomorphism test is essentially the same as the color refinement algorithm (the difference has to do with non-edges and is irrelevant for all practical purposes as it is trivial to see that graphs with a different number of nodes are non-isomorphic). The algorithm proceeds as follows:
Initialization All nodes are initialized with the same color 0
Refinement Two nodes u,v get a different color if a) they had a different color before or b) there is a color c such that u and v have a different number of c-colored neighbors
Termination The algorithm ends if the partition induced by two successive refinement steps is the same.
In order to use this algorithm as a graph isomorphism test, one runs the algorithm on two input graphs G and H in parallel, i.e. using the colors when splitting such that some color c (after one iteration) might mean `a node with exactly 5 neighbors of color 0'. In practice this is achieved by running color refinement on the disjoint union graph of G and H. One can then look at the histogram of colors of both graphs (counting the number of nodes after color refinement stabilized) and if they differ, this is a certificate that both graphs are not isomorphic.
The algorithm terminates after at most
n-1
n
The refinement of the partition in each step is by processing for each node its label and the labels of its nearest neighbors. Therefore WLtest can be viewed as a message passing algorithm which also connects it to graph neural networks.
This is the place where the aforementioned two variants of the WL algorithm appear. Both the k-dimensional Weisfeiler-Leman (k-WL) and the k-dimensional folklore Weisfeiler-Leman algorithm (k-FWL) are extensions of 1-WL from above operating on k-tuples instead of individual nodes. While their difference looks innocent on the first glance, it can be shown that k-WL and (k-1)-FWL (for k>2) distinguish the same pairs of graphs.
Input: a graph G = (V,E) # initialize
c0(\barv)=Hash(type(\barv))
\barv\inVk
\ell | |
c | |
i(\bar |
v)=\{\{c\ell-1(\barw):\barw\inlNi(\barv)\}\}
\barv\inVk,i\in[k]
c\ell(\barv)=Hash(c\ell-1(\barv),
\ell | |
c | |
1(\bar |
\ell | |
v),...,c | |
k(\bar |
v))
c\ell\equivc\ell-1
Vk
c\ell
Here the neighborhood
lNi(\barv)
\barv
\barv
lNi(\barv)=\{(v1,...,vi-1,w,vi+1,...,vk):w\inV\}
type(\barv)
\barv
type(\barv)
The key idea of k-WL is to expand the neighborhood notion to k-tuples and then effectively run color refinement on the resulting graph.
Input: a graph G = (V,E) # initialize
c0(\barv)=Hash(type(\barv)
\barv\inVk
\ell | |
c | |
w(\bar |
v)=(c\ell-1(\barv[1]\leftarroww),...,c\ell-1(\barv[k]\leftarroww))
\barv\inVk,w\inV
c\ell(\barv)=Hash(c\ell-1(\barv),
\ell | |
\{\{c | |
w(\bar |
v):w\inV\}\})
c\ell\equivc\ell-1
Vk
c\ell
Here
\barv[i]\leftarroww
\barv
w
Note that there is one major difference between k-WL and k-FWL: k-FWL checks what happens if a single node w is placed at any position of the k-tuple (and then computes the multiset of these k-tuples) while k-WL looks at the multisets that you get when changing the i-th component only of the original k-tuple. It then uses all those multisets in the hash that computes the new color.
It can be shown (although only through the connection to logic) that k-FWL and (k+1)-WL are equivalent (for
k\geq2
U = combineTwo(G, H)glabels = initializeLabels(U) # dictionary where every node gets the same label 0labels = # dictionary that will provide translation from a string of labels of a node and its neighbors to an integernewLabel = 1done = Falsewhile not(done): glabelsNew = # set up the dictionary of labels for the next step for node in U: label = str(glabels[node]) + str([glabels[x] for x in neighbors of node].sort) if not(label in labels): # a combination of labels from the node and its neighbors is encountered for the first time labels[label] = newLabel # assign the string of labels to a new number as an abbreviated label newLabel += 1 # increase the counter for assigning new abbreviated labels glabelsNew[node] = labels[label] if (number of different labels in glabels)
certificateH: test = Trueelse: test = False
Here is some actual Python code which includes the description of the first examples.
def combineTwo(g1, g2): g = n = len(g1) for node in g1: s = set for neighbor in g1[node]: s.add(neighbor) g[node] = s.copy for node in g2: s = set for neighbor in g2[node]: s.add(neighbor + n) g[node + n] = s.copy return g
g = combineTwo(g5_00, g5_02)labels = glabels = for i in range(len(g)): glabels[i] = 0glabelsCount = 1newlabel = 1
done = Falsewhile not (done): glabelsNew = glabelsCountNew = 0 for node in g: label = str(glabels[node]) s2 = [] for neighbor in g[node]: s2.append(glabels[neighbor]) s2.sort for i in range(len(s2)): label += "_" + str(s2[i]) if not (label in labels): labels[label] = newlabel newlabel += 1 glabelsCountNew += 1 glabelsNew[node] = labels[label] if glabelsCount
g0labels = []for i in range(len(g0)): g0labels.append(glabels[i])g0labels.sortcertificate0 = ""for i in range(len(g0)): certificate0 += str(g0labels[i]) + "_"g1labels = []for i in range(len(g1)): g1labels.append(glabels[i + len(g0)])g1labels.sortcertificate1 = ""for i in range(len(g1)): certificate1 += str(g1labels[i]) + "_"
if certificate0
The first three examples are for graphs of order 5.[2]
WLpair takes 3 rounds on 'G0' and 'G1'. The test succeeds as the certificates agree.
WLpair takes 4 rounds on 'G0' and 'G2'. The test fails as the certificates disagree. Indeed 'G0' has a cycle of length 5, while 'G2' doesn't, thus 'G0' and 'G2' cannot be isomorphic.
WLpair takes 4 rounds on 'G1' and 'G2'. The test fails as the certificates disagree. From the previous two instances we already know
G1\congG0\not\congG2
Indeed G0 and G1 are isomorphic. Any isomorphism must respect the components and therefore the labels. This can be used for kernelization of the graph isomorphism problem. Note that not every map of vertices that respects the labels gives an isomorphism. Let
\varphi:G0 → G1
\psi:G0 → G1
\varphi(a)=D,\varphi(b)=C,\varphi(c)=B,\varphi(d)=E,\varphi(e)=A
\psi(a)=B,\psi(b)=C,\psi(c)=D,\psi(d)=E,\psi(e)=A
\varphi
\psi
When applying WLpair to G0 and G2 we get for G0 the certificate 7_7_8_9_9. But the isomorphic G1 gets the certificate 7_7_8_8_9 when applying WLpair to G1 and G2. This illustrates the phenomenon about labels depending on the execution order of the WLtest on the nodes. Either one finds another relabeling method that keeps uniqueness of labels, which becomes rather technical, or one skips the relabeling altogether and keeps the label strings, which blows up the length of the certificate significantly, or one applies WLtest to the union of the two tested graphs, as we did in the variant WLpair. Note that although G1 and G2 can get distinct certificates when WLtest is executed on them separately, they do get the same certificate by WLpair.
The next example is about regular graphs. WLtest cannot distinguish regular graphs of equal order,[3] but WLpair can distinguish regular graphs of distinct degree even if they have the same order. In fact WLtest terminates after a single round as seen in these examples of order 8, which are all 3-regular except the last one which is 5-regular.
All four graphs are pairwise non-isomorphic. G8_00 has two connected components, while the others do not. G8_03 is 5-regular, while the others are 3-regular. G8_01 has no 3-cycle while G8_02 does have 3-cycles.
Another example of two non-isomorphic graphs that WLpair cannot distinguish is given here.[4]
The theory behind the Weisfeiler Leman test is applied in graph neural networks.
In machine learning of nonlinear data one uses kernels to represent the data in a high dimensional feature space after which linear techniques such as support vector machines can be applied. Data represented as graphs often behave nonlinear. Graph kernels are method to preprocess such graph based nonlinear data to simplify subsequent learning methods. Such graph kernels can be constructed by partially executing a Weisfeiler Leman test and processing the partition that has been constructed up to that point.[5] These Weisfeiler Leman graph kernels have attracted considerable research in the decade after their publication. They are also implemented in dedicated libraries for graph kernels such as GraKeL.[6]
Note that kernels for artificial neural network in the context of machine learning such as graph kernels are not to be confused with kernels applied in heuristic algorithms to reduce the computational cost for solving problems of high complexity such as instances of NP-hard problems in the field of complexity theory. As stated above the Weisfeiler Leman test can also be applied in the later context.