Modularity (networks) explained

Modularity is a measure of the structure of networks or graphs which measures the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities.

Motivation

Many scientifically important problems can be represented and empirically studied using networks. For example, biological and social patterns, the World Wide Web, metabolic networks, food webs, neural networks and pathological networks are real world problems that can be mathematically represented and topologically studied to reveal some unexpected structural features.^[1] Most of these networks possess a certain community structure that has substantial importance in building an understanding regarding the dynamics of the network. For instance, a closely connected social community will imply a faster rate of transmission of information or rumor among them than a loosely connected community. Thus, if a network is represented by a number of individual nodes connected by links which signify a certain degree of interaction between the nodes, communities are defined as groups of densely interconnected nodes that are only sparsely connected with the rest of the network. Hence, it may be imperative to identify the communities in networks since the communities may have quite different properties such as node degree, clustering coefficient, betweenness, centrality,^[2] etc., from that of the average network. Modularity is one such measure, which when maximized, leads to the appearance of communities in a given network.

Definition

Modularity is the fraction of the edges that fall within the given groups minus the expected fraction if edges were distributed at random. The value of the modularity for unweighted and undirected graphs lies in the range

[-1/2,1]

.^[3] It is positive if the number of edges within groups exceeds the number expected on the basis of chance. For a given division of the network's vertices into some modules, modularity reflects the concentration of edges within modules compared with random distribution of links between all nodes regardless of modules.

There are different methods for calculating modularity.^[1] In the most common version of the concept, the randomization of the edges is done so as to preserve the degree of each vertex. Consider a graph with

nodes and

links (edges) such that the graph can be partitioned into two communities using a membership variable

. If a node

belongs to community 1,

s_v=1

, or if

belongs to community 2,

s_v=-1

. Let the adjacency matrix for the network be represented by

, where

A_{vw

}=0 means there's no edge (no interaction) between nodes

and

A_vw=1

means there is an edge between the two. Also for simplicity we consider an undirected network. Thus

A_vw=A_wv

. (It is important to note that multiple edges may exist between two nodes, but here we assess the simplest case).

Modularity

is then defined as the fraction of edges that fall within group 1 or 2, minus the expected number of edges within groups 1 and 2 for a random graph with the same node degree distribution as the given network.

The expected number of edges shall be computed using the concept of a configuration model.^[4] The configuration model is a randomized realization of a particular network. Given a network with

nodes, where each node

has a node degree

k_v

, the configuration model cuts each edge into two halves, and then each half edge, called a stub, is rewired randomly with any other stub in the network, even allowing self-loops (which occur when a stub is rewired to another stub from the same node) and multiple-edges between the same two nodes. Thus, even though the node degree distribution of the graph remains intact, the configuration model results in a completely random network.

Expected Number of Edges Between Nodes

Now consider two nodes

and

, with node degrees

k_v

and

k_w

respectively, from a randomly rewired network as described above. We calculate the expected number of full edges between these nodes.

Let us consider each of the

k_v

stubs of node

and create associated indicator variables

	(v,w)
I
	i

for them,

i=1,\ldots,k_v

, with

	(v,w)
I
	i

if the

-th stub happens to connect to one of the

k_w

stubs of node

in this particular random graph. If it does not, then

	(v,w)
I
	i

. Since the

-th stub of node

can connect to any of the

2m-1

remaining stubs with equal probability (while

is the number of edges in the original graph), and since there are

k_w

stubs it can connect to associated with node

, evidently

	(v,w)
p(I
	i

=1)=

	(v,w)
E[I
	i

	k_w
	2m-1

The total number of full edges

J_vw

between

and

is just

J_vw=

	k_v
\sum
	i=1

	(v,w)
I
	i

, so the expected value of this quantity is

E[J_vw]=

	k_v
E\left[\sum
	i=1

	(v,w)
I
	i

\right]=

	k_v
\sum
	i=1

	(v,w)
E[I
	i

	k_v
\sum
	i=1

	k_w
	2m-1

	k_vk_w
	2m-1

Many texts then make the following approximations, for random networks with a large number of edges. When

is large, they drop the subtraction of

in the denominator above and simply use the approximate expression

	k_vk_w
	2m

for the expected number of edges between two nodes. Additionally, in a large random network, the number of self-loops and multi-edges is vanishingly small.^[5] Ignoring self-loops and multi-edges allows one to assume that there is at most one edge between any two nodes. In that case,

J_vw

becomes a binary indicator variable, so its expected value is also the probability that it equals

, which means one can approximate the probability of an edge existing between nodes

and

	k_vk_w
	2m

Modularity

Hence, the difference between the actual number of edges between node

and

and the expected number of edges between them is

A_vw-

	k_vk_w
	2m

Summing over all node pairs gives the equation for modularity,

It is important to note that holds good for partitioning into two communities only. Hierarchical partitioning (i.e. partitioning into two communities, then the two sub-communities further partitioned into two smaller sub communities only to maximize Q) is a possible approach to identify multiple communities in a network. Additionally, (3) can be generalized for partitioning a network into c communities.^[6]

where e_ij is the fraction of edges with one end vertices in community i and the other in community j:

e_ij=\sum_vw

	A_vw
	2m

1
	v\inc_i

1
	w\inc_j

and a_i is the fraction of ends of edges that are attached to vertices in community i:

i=	k_i
	2m

=\sum_je_ij

Example of multiple community detection

We consider an undirected network with 10 nodes and 12 edges and the following adjacency matrix.

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0

The communities in the graph are represented by the red, green and blue node clusters in Fig 1. The optimal community partitions are depicted in Fig 2.

Matrix formulation

An alternative formulation of the modularity, useful particularly in spectral optimization algorithms, is as follows. Define

S_vr

to be

if vertex

belongs to group

and

otherwise. Then

\delta(c_v,c_w)=\sum_rS_vrS_wr

and hence

	1
	2m

\sum_vw\sum_r\left[A_vw-

	k_vk_w
	2m

\right]S_vrS_wr=

	1
	2m

Tr(S^TBS),

where

is the (non-square) matrix having elements

S_v

and

is the so-called modularity matrix, which has elements

B_vw=A_vw-

	k_vk_w
	2m

All rows and columns of the modularity matrix sum to zero, which means that the modularity of an undivided network is also always

For networks divided into just two communities, one can alternatively define

s_v=\pm1

to indicate the community to which node

belongs, which then leads to

Q={1\over4m}\sum_vwB_vws_vs_w={1\over4m}s^TBs,

where

is the column vector with elements

s_v

This function has the same form as the Hamiltonian of an Ising spin glass, a connection that has been exploited to create simple computer algorithms, for instance using simulated annealing, to maximize the modularity. The general form of the modularity for arbitrary numbers of communities is equivalent to a Potts spin glass and similar algorithms can be developed for this case also.^[7]

Overfitting

Although the method of modularity maximization is motivated by computing a deviation from a null model, this deviation is not computed in a statistically consistent manner.^[8] Because of this, the method notoriously finds high-scoring communities in its own null model(the configuration model), which by definition cannot be statistically significant. Because of this, the method cannot be used to reliably obtain statistically significant community structure in empirical networks.

Resolution limit

Modularity compares the number of edges inside a cluster with the expected number of edges thatone would find in the cluster if the network were a random network with the same number of nodes and whereeach node keeps its degree, but edges are otherwise randomly attached. This random null model implicitly assumes that each node can get attached to any other node of the network. This assumption is however unreasonable if the network is very large, as the horizon of a node includes a small part of the network, ignoring most of it.Moreover, this implies that the expected number of edges between two groups of nodes decreases if the size of the network increases. So, if a network is large enough, the expected number of edges between two groups of nodes in modularity's null model may be smaller than one. If this happens, a single edge between the two clusters would be interpreted by modularity as a sign of a strong correlation between the two clusters, and optimizing modularity would lead to the merging of the two clusters, independently of the clusters' features. So, even weakly interconnected complete graphs, which have the highest possible density of internal edges, and represent the best identifiable communities, would be merged by modularity optimization if the network were sufficiently large.^[9] For this reason, optimizing modularity in large networks would fail to resolve small communities, even when they are well defined. This biasis inevitable for methods like modularity optimization, which rely on a global null model.^[10]

Multiresolution methods

There are two main approaches which try to solve the resolution limit within the modularity context: the addition of a resistance r to every node, in the form of a self-loop, which increases (r>0) or decreases (r<0) the aversion of nodes to form communities;^[11] or the addition of a parameter γ>0 in front of the null-case term in the definition of modularity, which controls the relative importance between internal links of the communities and the null model. Optimizing modularity for values of these parameters in their respective appropriate ranges, it is possible to recover the whole mesoscale of the network, from the macroscale in which all nodes belong to the same community, to the microscale in which every node forms its own community, hence the name multiresolution methods. However, it has been shown that these methods have limitations when communities are very heterogeneous in size.^[12]

Software Tools

There are a couple of software tools available that are able to compute clusterings in graphs with good modularity.

Original implementation of the multi-level Louvain method.

The Leiden algorithm which additionally avoids unconnected communities.

The Vienna Graph Clustering (VieClus) algorithm, a parallel memetic algorithm.

References

Newman, M. E. J. . 2006 . Modularity and community structure in networks . Proceedings of the National Academy of Sciences of the United States of America . 103 . 23 . 8577–8696 . 10.1073/pnas.0601602103 . 16723398 . 1482622 . physics/0602124 . 2006PNAS..103.8577N . free .
Newman, M. E. J. . 2007 . Mathematics of networks . The New Palgrave Encyclopedia of Economics . 2 . Palgrave Macmillan, Basingstoke .
Brandes . U.. Ulrik Brandes . Delling . D. . Gaertler . M. . Gorke . R. . Hoefer . M. . Nikoloski . Z. . Wagner . D. . On Modularity Clustering . IEEE Transactions on Knowledge and Data Engineering . February 2008 . 20 . 2 . 172–188 . 10.1109/TKDE.2007.190689. 150684 .
Book: van der Hofstad , Remco . 2013 . Random Graphs and Complex Networks . http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf#page=149 . Chapter 7 . 2013-12-08 . 2013-12-18 . https://web.archive.org/web/20131218121403/http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf#page=149 . live .
Web site: NetworkScience . Albert-László Barabási . 2020-03-20 . 2020-03-05 . https://web.archive.org/web/20200305044424/http://networksciencebook.com/ . live .
. 2004 . Finding community structure in very large networks . Phys. Rev. E . 70 . 6 . 066111 . 10.1103/PhysRevE.70.066111 . cond-mat/0408187 . 2004PhRvE..70f6111C . 15697438 . 8977721 .
Joerg Reichardt . Stefan Bornholdt . amp . 2006 . Statistical mechanics of community detection . Physical Review E . 74 . 1 . 016110 . 10.1103/PhysRevE.74.016110. cond-mat/0603718 . 2006PhRvE..74a6110R . 16907154 . 792965 .
Peixoto . Tiago P. . 2112.00183 . Descriptive vs. inferential community detection: pitfalls, myths and half-truths . 2021 .
Santo Fortunato . Marc Barthelemy . amp . 2007 . Resolution limit in community detection . Proceedings of the National Academy of Sciences of the United States of America . 104 . 36–41 . 17190818 . 10.1073/pnas.0605965104 . 1 . 1765466. physics/0607100 . 2007PNAS..104...36F . free .
J.M. Kumpula . J. Saramäki . K. Kaski . J. Kertész . amp . 2007 . Limited resolution in complex network community detection with Potts model approach . European Physical Journal B . 56 . 1 . 41–45 . 10.1140/epjb/e2007-00088-4. cond-mat/0610370 . 2007EPJB...56...41K . 4411525 .
Alex Arenas, Alberto Fernández and Sergio Gómez . 2008 . Analysis of the structure of complex networks at different resolution levels . New Journal of Physics . 10 . 5 . 053039 . 10.1088/1367-2630/10/5/053039. physics/0703218 . 2008NJPh...10e3039A . 11544197 .
Andrea Lancichinetti . Santo Fortunato . amp . 2011 . Limits of modularity maximization in community detection . Physical Review E . 84 . 6 . 066122 . 10.1103/PhysRevE.84.066122. 22304170 . 1107.1155 . 2011PhRvE..84f6122L . 16180375 .

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0

Modularity (networks) explained

Motivation

Definition

Expected Number of Edges Between Nodes

Modularity

Example of multiple community detection

Matrix formulation

Overfitting

Resolution limit

Multiresolution methods

Software Tools

See also

References

Node ID	1	2	3	4	5	6	7	8	9	10
1	0	1	1	0	0	0	0	0	0	1
2	1	0	1	0	0	0	0	0	0	0
3	1	1	0	0	0	0	0	0	0	0
4	0	0	0	0	1	1	0	0	0	1
5	0	0	0	1	0	1	0	0	0	0
6	0	0	0	1	1	0	0	0	0	0
7	0	0	0	0	0	0	0	1	1	1
8	0	0	0	0	0	0	1	0	1	0
9	0	0	0	0	0	0	1	1	0	0
10	1	0	0	1	0	0	1	0	0	0