Intersection number (graph theory) explained
In the mathematical field of graph theory, the intersection number of a graph
is the smallest number of elements in a representation of
as an
intersection graph of
finite sets. In such a representation, each vertex is represented as a set, and two vertices are connected by an edge whenever their sets have a common element. Equivalently, the intersection number is the smallest number of
cliques needed to
cover all of the edges of
.
A set of cliques that cover all edges of a graph is called a clique edge cover or edge clique cover, or even just a clique cover, although the last term is ambiguous: a clique cover can also be a set of cliques that cover all vertices of a graph. Sometimes "covering" is used in place of "cover". As well as being called the intersection number, the minimum number of these cliques has been called the R-content, edge clique cover number, or clique cover number. The problem of computing the intersection number has been called the intersection number problem, the intersection graph basis problem, covering by cliques, the edge clique cover problem, and the keyword conflict problem.
Every graph with
vertices and
edges has intersection number at most
. The intersection number is
NP-hard to compute or approximate, but fixed-parameter tractable.
Definitions
Intersection graphs
Let
be any
family of sets, allowing sets in
to be repeated. Then the
intersection graph of
is an undirected graph that has a vertex for each set in
and an edge between each two sets that have a nonempty intersection. Every graph can be represented as an intersection graph in this way. The intersection number of the graph is the smallest number
such that there exists a representation of this type for which the union of the sets in
has
elements. The problem of finding an intersection representation of a graph with a given number of elements is known as the
intersection graph basis problem.
Clique edge covers
An alternative definition of the intersection number of a graph
is that it is the smallest number of
cliques in
(
complete subgraphs of
) that together cover all of the edges of
. A set of cliques with this property is known as a
clique edge cover or
edge clique cover, and for this reason the intersection number is also sometimes called the
edge clique cover number.
Equivalence
The equality of the intersection number and the edge clique cover number is straightforward to prove. In one direction, suppose that
is the intersection graph of a family
of sets whose union
has
elements. Then for any element
, the subset of vertices of
corresponding to sets that contain
forms a clique: any two vertices in this subset are adjacent, because their sets have a nonempty intersection containing
. Further, every edge in
is contained in one of these cliques: if an edge comes from a non-empty intersection of sets containing an element
, then that edge is contained in the clique for
. Therefore, the edges of
can be covered by
cliques, one per element of
.
In the other direction, if a graph
can be covered by
cliques, then each vertex
of
may be represented by a subset of the cliques, the ones that contain vertex
. Two of these subsets, for two vertices
and
, have a nonempty intersection if and only if there is a clique in the intersection that contains both of them, if and only if there is an edge
included in one of the covering cliques.
Applications
The representation of a graph as an abstract intersection graph of sets can be used to construct more concrete geometric intersection representations of the same graph. In particular, if a graph has intersection number
, it can be represented as an intersection graph of
-dimensional unit
hyperspheres (its
sphericity is at most
).
A clique cover can be used as a kind of adjacency labelling scheme for a graph, in which one labels each vertex by a binary value with a bit for each clique, zero if it does not belong to the clique and one if it belongs. Then two vertices are adjacent if and only if the bitwise and of their labels is nonzero. The length of the labels is the intersection number of the graph. This method was used in an early application of intersection numbers, for labeling a set of keywords so that conflicting keywords could be quickly detected, by E. Kellerman of IBM. For this reason, another name for the problem of computing intersection numbers is the keyword conflict problem. Similarly, in computational geometry, representations based on the intersection number have been considered as a compact representation for visibility graphs, but there exist geometric inputs for which this representation requires a near-quadratic number of cliques.
Another class of applications comes from scheduling problems in which multiple users of a shared resource should be scheduled for time slots, in such a way that incompatible requests are never scheduled for the same time slot but all pairs of compatible requests are given at least one time slot together. The intersection number of a graph of compatibilities gives the minimum number of time slots needed for such a schedule. In the design of compilers for very long instruction word computers, a small clique cover of a graph of incompatible operations can be used to represent their incompatibilities by a small number of artificial resources, allowing resource-based scheduling techniques to be used to assign operations to instruction slots.
Shephard and Vetta observe that the intersection number of any network equals the minimum number of constraints needed in an integer programming formulation of the problem of computing maximum independent sets, in which one has a 0-1 variable per vertex and a constraint that in each clique of a clique cover the variables sum to at most one. They argue that, for the intersection graphs of paths in certain fiber optic communications networks, these intersection numbers are small, explaining the relative ease of solving certain optimization problems in allocating bandwidth on the networks.
In statistics and data visualization, edge clique covers of a graph representing statistically indistinguishable pairs of variables are used to produce compact letter displays that assist in visualizing multiple pairwise comparisons, by assigning a letter or other visual marker for each clique and using these to provide a graphical representation of which variables are indistinguishable.
In the analysis of food webs describing predator-prey relationships among animal species, a competition graph or niche overlap graph is an undirected graph in which the vertices represent species, and edges represent pairs of species that both compete for the same prey. These can be derived from a directed acyclic graph representing predator-prey relations by drawing an edge
in the competition graph whenever there exists a prey species
such that the predator-prey relation graph has edges
and
. Every competition graph must have at least one
isolated vertex, and the
competition number of an arbitrary graph represents the smallest number of isolated vertices that could be added to make it into a competition graph. Biologically, if part of a competition graph is observed, then the competition number represents the smallest possible number of unobserved prey species needed to explain it. The competition number is at most equal to the intersection number: one can transform any undirected graph into a competition graph by adding a prey species for each clique in an edge clique cover. However, this relation is not exact, because it is also possible for the predator species to be prey of other species. In a graph with
vertices, at most
of them can be the prey of more than one other species, so the competition number is at least the intersection number
Edge clique covers have also been used to infer the existence of protein complexes, systems of mutually interacting proteins, from protein–protein interaction networks describing only the pairwise interactions between proteins. More generally, Guillaume and Latapy have argued that, for complex networks of all types, replacing the network by a bipartite graph connecting its vertices to the cliques in a clique cover highlights the structure in the network.
Upper bounds
Trivially, a graph with
edges has intersection number at most
. Each edge is itself a two-vertex clique. There are
of these cliques and together they cover all the edges. It is also true that every graph with
vertices has intersection number at most
. More strongly, the edges of every
-vertex graph can be covered by at most
cliques, all of which are either single edges or triangles. An algorithm for finding this cover is simple: remove any two adjacent vertices and inductively cover the remaining graph. Restoring the two removed vertices, cover edges to their shared neighbors by triangles, leaving edges to unshared neighbors as two-vertex cliques. The inductive cover has at most
cliques, and the two removed vertices contribute at most
cliques, maximized when all other vertices are unshared neighbors and the edge between the two vertices must be used as a clique. Adding these two quantities gives
cliques total. This generalizes Mantel's theorem that a
triangle-free graph has at most
edges, for in a triangle-free graph the only optimal clique edge cover has one clique per edge and therefore the intersection number equals the number of edges.
An even tighter bound is possible when the number of edges is strictly greater than
. Let
be the number of pairs of vertices that are not connected by an edge in the given graph
, and let
be the unique integer for which
. Then the intersection number of
is at most
. Graphs that are the
complement of a
sparse graph have small intersection numbers: the intersection number of any
-vertex graph
is at most
, where
is the
base of the natural logarithm and
is the maximum
degree of the complement graph of
.
It follows from deep results on the structure of claw-free graphs that, when a connected
-vertex claw-free graph has at least three independent vertices, it has intersection number at most
. It remains an unsolved problem whether this is true of all claw-free graphs without requiring them to have large independent sets. An important subclass of the claw-free graphs are the
line graphs, graphs representing edges and touching pairs of edges of some other graph
. An optimal clique cover of the line graph
may be formed with one clique for each triangle in
that has two or three degree-2 vertices, and one clique for each vertex that has degree at least two and is not a degree-two vertex of one of these triangles. The intersection number is the number of cliques of these two types.
In the Erdős–Rényi–Gilbert model of random graphs, in which all graphs on
labeled vertices are equally likely (or equivalently, each edge is present or absent, independently of other edges, with probability
) the intersection number of an
-vertex random graph is with high probability
smaller by a factor of
than the number of edges. In these graphs, the maximum cliques have (with high probability) only a logarithmic number of vertices, implying that this many of them are needed to cover all edges. The tricker part of the bound is proving that it is possible to find enough logarithmically-sized cliques to cover many edges, allowing the remaining leftover edges to be covered by two-vertex cliques.
Much of the early research on intersection numbers involved calculating these numbers on various specific graphs, such as the graphs formed by removing a complete subgraph or a perfect matching from a larger complete graph.
Computational complexity
Testing whether a given graph
has intersection number at most a given number
is
NP-complete. Therefore, it is also NP-hard to compute the intersection number of a given graph. In turn, the hardness of the intersection number has been used to prove that it is NP-complete to recognize the
squares of
split graphs.
The problem of computing the intersection number is, however, fixed-parameter tractable: that is, it can be solved in an amount of time bounded by a polynomial in
multiplied by a larger but computable function of the intersection number
. This may be shown by observing that there are at most
distinct
closed neighborhoods in the graph – two vertices that belong to the same set of cliques have the same neighborhood – and that the graph formed by selecting one vertex per closed neighborhood has the same intersection number as the original graph. Therefore, in polynomial time the input can be reduced to a smaller
kernel with at most
vertices and
edges. Applying an exponential time
dynamic programming search procedure over subsets of edges of this kernel gives time
,
double exponential in
. The double-exponential dependence on
cannot be reduced to single exponential by a kernelization of polynomial size, unless the
polynomial hierarchy collapses, and if the
exponential time hypothesis is true then double-exponential dependence is necessary regardless of whether kernelization is used. On graphs of bounded
treewidth, dynamic programming on a
tree decomposition of the graph can find the intersection number in linear time, but simpler algorithms based on finite sets of reduction rules do not work.
The problem cannot be approximated in polynomial time with an approximation ratio better than
, for some constant
, and the best approximation ratio is known is better than the trivial
by only a polylogarithmic factor. Researchers in this area have also investigated the computational efficiency of heuristics, without guarantees on the solution quality they produce, and their behavior on real-world networks.
More efficient algorithms are known for certain special classes of graphs. The intersection number of an interval graph is always equal to its number of maximal cliques, which may be computed in polynomial time. More generally, in chordal graphs, the intersection number may be computed by an algorithm that considers the vertices in an elimination ordering of the graph (an ordering in which each vertex and its later neighbors form a clique) and that, for each vertex
, forms a clique for
and its later neighbors whenever at least one of the edges incident to
is not covered by any earlier clique. It is also possible to find the intersection number in linear time in
circular-arc graphs. However, although these graphs have only a polynomial number of cliques to choose among for the cover, having few cliques alone is not enough to make the problem easy: there exist families of graphs with polynomially many cliques for which the intersection number remains NP-hard. The intersection number can also be found in polynomial time for graphs whose maximum degree is five, but is NP-hard for graphs of maximum degree six. On
planar graphs, computing the intersection number exactly remains NP-hard, but it has a
polynomial-time approximation scheme based on
Baker's technique.
See also
- Bipartite dimension, the smallest number of bicliques needed to cover all edges of a graph
- Bound graph, a type of graph characterized by clique edge covers of a special form
- Clique cover, the NP-hard problem of finding a small number of cliques that cover all vertices of a graph