Clustering coefficient explained

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971;^[1] Watts and Strogatz, 1998^[2]).

Two versions of this measure exist: the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the extent of "clustering" of a single node.

Local clustering coefficient

The local clustering coefficient of a vertex (node) in a graph quantifies how close its neighbours are to being a clique (complete graph). Duncan J. Watts and Steven Strogatz introduced the measure in 1998 to determine whether a graph is a small-world network.

A graph

G=(V,E)

formally consists of a set of vertices

and a set of edges

between them. An edge

e_ij

connects vertex

v_i

with vertex

v_j

N_i

for a vertex

v_i

is defined as its immediately connected neighbours as follows:

N_i=\{v_j:e_ij\inE\lore_ji\inE\}.

We define

k_i

as the number of vertices,

|N_i|

, in the neighbourhood,

N_i

, of vertex

v_i

The local clustering coefficient

C_i

for a vertex

v_i

is then given by a proportion of the number of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph,

e_ij

is distinct from

e_ji

, and therefore for each neighbourhood

N_i

there are

k_i(k_i-1)

links that could exist among the vertices within the neighbourhood (

k_i

is the number of neighbours of a vertex). Thus, the local clustering coefficient for directed graphs is given as

C_i=

	\|\{e_jk:v_j,v_k\inN_i,e_jk\inE\
	\|}{k

_i(k_i-1)}.

An undirected graph has the property that

e_ij

and

e_ji

are considered identical. Therefore, if a vertex

v_i

has

k_i

neighbours,

	k_i(k_i-1)
	2

edges could exist among the vertices within the neighbourhood. Thus, the local clustering coefficient for undirected graphs can be defined as

C_i=

	2\|\{e_jk:v_j,v_k\inN_i,e_jk\inE\
	\|}{k

_i(k_i-1)}.

Let

λ_G(v)

be the number of triangles on

v\inV(G)

for undirected graph

. That is,

λ_G(v)

is the number of subgraphs of

with 3 edges and 3 vertices, one of which is

. Let

\tau_G(v)

be the number of triples on

v\inG

. That is,

\tau_G(v)

is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is

and such that

is incident to both edges. Then we can also define the clustering coefficient as

C_i=

	λ_G(v)
	\tau_G(v)

It is simple to show that the two preceding definitions are the same, since

\tau_G(v)=C({k_i},2)=

	1
	2

k_i(k_i-1).

These measures are 1 if every neighbour connected to

v_i

is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to

v_i

connects to any other vertex that is connected to

v_i

Since any graph is fully specified by its adjacency matrix A, the local clustering coefficient for a simple undirected graph can be expressed in terms of A as:^[3]

i=	1
	k_i(k_i-1)

\sum_j,kA_ijA_jkA_ki

where:

k_i=\sum_jA_ij

and C_i=0 when k_i is zero or one. In the above expression, the numerator counts twice the number of complete triangles that vertex i is involved in. In the denominator, k_i² counts the number of edge pairs that vertex i is involved in plus the number of single edges traversed twice. k_i is the number of edges connected to vertex i, and subtracting k_i then removes the latter, leaving only a set of edge pairs that could conceivably be connected into triangles. For every such edge pair, there will be another edge pair which could form the same triangle, so the denominator counts twice the number of conceivable triangles that vertex i could be involved in.

Global clustering coefficient

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle graph therefore includes three closed triplets, one centred on each of the nodes (n.b. this means the three triplets in a triangle come from overlapping selections of nodes). The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949).^[4] This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243^[5]).

The global clustering coefficient is defined as:

	numberofclosedtriplets
	numberofalltriplets(openandclosed)

The number of closed triplets has also been referred to as 3 × triangles in the literature, so:

	3 x numberoftriangles
	numberofalltriplets

A generalisation to weighted networks was proposed by Opsahl and Panzarasa (2009),^[6] and a redefinition to two-mode networks (both binary and weighted) by Opsahl (2009).^[7]

Since any simple graph is fully specified by its adjacency matrix A, the global clustering coefficient for an undirected graph can be expressed in terms of A as:

\sum_i,j,kA_ijA_jkA_ki

	1	\sum_ik_i(k_i-1)
	2

where:

k_i=\sum_jA_ij

and C=0 when the denominator is zero.

Network average clustering coefficient

As an alternative to the global clustering coefficient, the overall level of clustering in a network is measured by Watts and Strogatz as the average of the local clustering coefficients of all the vertices

:^[8]

\bar{C}=

	1
	n

	n
\sum
	i=1

C_i.

It is worth noting that this metric places more weight on the low degree nodes, while the transitivity ratio places more weight on the high degree nodes.

A generalisation to weighted networks was proposed by Barrat et al. (2004),^[9] and a redefinition to bipartite graphs (also called two-mode networks) by Latapy et al. (2008)^[10] and Opsahl (2009).^[7]

Alternative generalisations to weighted and directed graphs have been provided by Fagiolo (2007)^[11] and Clemente and Grassi (2018).^[12]

This formula is not, by default, defined for graphs with isolated vertices; see Kaiser (2008)^[13] and Barmpoutis et al.^[14] The networks with the largest possible average clustering coefficient are found to have a modular structure, and at the same time, they have the smallest possible average distance among the different nodes.

Percolation of clustered networks

For a random tree-like network without degree-degree correlation, it can be shown that such network can have a giant component, and the percolation threshold (transmission probability) is given by

p_c=

	1
	g_1'(1)

, where

g_1(z)

is the generating function corresponding to the excess degree distribution.

In networks with low clustering,

0<C\ll1

, the critical point gets scaled by

(1-C)^-1

such that:

p_c=

	1
	1-C

	1
	g_1'(1)

^[15]

This indicates that for a given degree distribution, the clustering leads to a larger percolation threshold, mainly because for a fixed number of links, the clustering structure reinforces the core of the network with the price of diluting the global connections. For networks with high clustering, strong clustering could induce the core–periphery structure, in which the core and periphery might percolate at different critical points, and the above approximate treatment is not applicable.^[16]

For studying the robustness of clustered networks a percolation approach is developed.^[17] ^[18]

Notes and References

P. W. Holland . P. W. Holland . S. Leinhardt . S. Leinhardt . amp . Transitivity in structural models of small groups . 1971 . Comparative Group Studies . 2 . 2 . 107 - 124 . 10.1177/104649647100200201. 145544488 .
D. J. Watts . D. J. Watts . Steven Strogatz . Steven Strogatz . amp . Collective dynamics of 'small-world' networks . June 1998 . . 393 . 440 - 442 . 10.1038/30918 . 9623998 . 6684 . 1998Natur.393..440W. 4429113 .
Wang . Yu . Ghumare . Eshwar . Vandenberghe . Rik . Dupont . Patrick . 2017 . Comparison of Different Generalizations of Clustering Coefficient and Local Efficiency for Weighted Undirected Graphs . Neural Computation . 29 . 2 . 313–331 . 10.1162/NECO_a_00914 . 27870616 . 11000115 . August 8, 2020 . free . August 10, 2020 . https://web.archive.org/web/20200810032708/https://www.mitpressjournals.org/ . live .
R. D. Luce . R. D. Luce . A. D. Perry . A. D. Perry . amp . A method of matrix analysis of group structure . 1949 . Psychometrika . 14 . 95 - 116 . 10.1007/BF02289146 . 1. 18152948. 10.1007/BF02289146 . 16186758 . free .
[Stanley Wasserman]
Tore Opsahl . Tore Opsahl . Pietro Panzarasa . Pietro Panzarasa . amp . Clustering in Weighted Networks . 2009 . Social Networks . 31 . 155 - 163 . 10.1016/j.socnet.2009.02.002 . 2 . 2009-06-11 . 2019-07-01 . https://web.archive.org/web/20190701224513/https://toreopsahl.com/2009/04/03/article-clustering-in-weighted-networks/ . live .
Tore Opsahl . Tore Opsahl . Clustering in Two-mode Networks . 2009 . Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009) . September 11, 2009 . March 21, 2016 . https://web.archive.org/web/20160321002314/http://toreopsahl.com/2009/09/11/clustering-in-two-mode-networks/ . live .
Book: Kemper, Andreas. Valuation of Network Effects in Software Markets: A Complex Networks Approach. Springer. 2009. 9783790823660. 142.
A. . Barrat . M. . Barthelemy . R. . Pastor-Satorras . A. . Vespignani . The architecture of complex weighted networks . 2004 . Proceedings of the National Academy of Sciences . 101 . 3747 - 3752 . 10.1073/pnas.0400087101 . 11 . 15007165 . 374315 . 2004PNAS..101.3747B. cond-mat/0311416 . free .
M. . Latapy . C. . Magnien . N. . Del Vecchio . Basic Notions for the Analysis of Large Two-mode Networks . 2008 . Social Networks . 30 . 31 - 48 . 1 . 10.1016/j.socnet.2007.04.006.
G. . Fagiolo . Clustering in complex directed networks . 2007 . Physical Review E . 76. 2 Pt 2 . 026107 . physics/0612169 . 10.1103/PhysRevE.76.026107 . 17930104 . 10.1.1.262.1006 . 2317676 .
G.P. . Clemente . R. . Grassi . Directed clustering in weighted networks: A new perspective . 2018 . Chaos, Solitons & Fractals . 107 . 26 - 38 . 10.1016/j.chaos.2017.12.007. 1706.07322 . 2018CSF...107...26C . 21919524 .
Marcus . Kaiser . Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks . 2008 . New Journal of Physics . 10 . 083042 . 8 . 10.1088/1367-2630/10/8/083042. 2008NJPh...10h3042K . 0802.2512 . 16480565 .
D. . Barmpoutis . R. M. . Murray . Networks with the Smallest Average Distance and the Largest Average Clustering . 1007.4031 . 2010 . q-bio.MN.
Berchenko. Yakir. Artzy-Randrup. Yael. Teicher. Mina. Stone. Lewi. 2009-03-30. Emergence and Size of the Giant Component in Clustered Random Graphs with a Given Degree Distribution. Physical Review Letters. en. 102. 13. 138701. 10.1103/PhysRevLett.102.138701. 19392410. 0031-9007. 2022-02-24. 2023-02-04. https://web.archive.org/web/20230204143725/https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.102.138701. live.
Berchenko. Yakir. Artzy-Randrup. Yael. Teicher. Mina. Stone. Lewi. 2009-03-30. Emergence and Size of the Giant Component in Clustered Random Graphs with a Given Degree Distribution. Physical Review Letters. en. 102. 13. 138701. 10.1103/PhysRevLett.102.138701. 19392410. 0031-9007. 2022-02-24. 2023-02-04. https://web.archive.org/web/20230204143725/https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.102.138701. live.
Random Graphs with Clustering. M. E. J. Newman . Phys. Rev. Lett. . 2009 . 103 . 5 . 058701. 10.1103/PhysRevLett.103.058701 . 19792540 . 0903.4009 . 28214709 .
Cascades on a class of clustered random networks. A. Hackett. S. Melnik. J. P. Gleeson . amp. Phys. Rev. E . 2011 . 83 . 5 Pt 2. 056107. 10.1103/PhysRevE.83.056107. 21728605. 1012.3651. 18071422.

Clustering coefficient explained

Local clustering coefficient

Global clustering coefficient

Network average clustering coefficient

Percolation of clustered networks

See also

Notes and References