Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes (or vertices) and the connections between the elements or actors as links (or edges). The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."[1]
The study of networks has emerged in diverse disciplines as a means of analyzing complex relational data. The earliest known paper in this field is the famous Seven Bridges of Königsberg written by Leonhard Euler in 1736. Euler's mathematical description of vertices and edges was the foundation of graph theory, a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of graph theory continued to develop and found applications in chemistry (Sylvester, 1878).
Dénes Kőnig, a Hungarian mathematician and professor, wrote the first book in Graph Theory, entitled "Theory of finite and infinite graphs", in 1936.[2]
In the 1930s Jacob Moreno, a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram and presented it to the public in April 1933 at a convention of medical scholars. Moreno claimed that "before the advent of sociometry no one knew what the interpersonal structure of a group 'precisely' looked like" (Moreno, 1953). The sociogram was a representation of the social structure of a group of elementary school students. The boys were friends of boys and the girls were friends of girls with the exception of one boy who said he liked a single girl. The feeling was not reciprocated. This network representation of social structure was found so intriguing that it was printed in The New York Times (April 3, 1933, page 17). The sociogram has found many applications and has grown into the field of social network analysis.
Probabilistic theory in network science developed as an offshoot of graph theory with Paul Erdős and Alfréd Rényi's eight famous papers on random graphs. For social networks the exponential random graph model or p* is a notational framework used to represent the probability space of a tie occurring in a social network. An alternate approach to network probability structures is the network probability matrix, which models the probability of edges occurring in a network, based on the historic presence or absence of the edge in a sample of networks.
In 1998, David Krackhardt and Kathleen Carley introduced the idea of a meta-network with the PCANS Model. They suggest that "all organizations are structured along these three domains, Individuals, Tasks, and Resources". Their paper introduced the concept that networks occur across multiple domains and that they are interrelated. This field has grown into another sub-discipline of network science called dynamic network analysis.
More recently other network science efforts have focused on mathematically describing different network topologies. Duncan Watts and Steven Strogatz reconciled empirical data on networks with mathematical representation, describing the small-world network. Albert-László Barabási and Reka Albert discovered scale-free networks, a property that captures the fact that in real network hubs coexist with many small degree vertices, and offered a dynamical model to explain the origin of this scale-free state.
The U.S. military first became interested in network-centric warfare as an operational concept based on network science in 1996. John A. Parmentola, the U.S. Army Director for Research and Laboratory Management, proposed to the Army's Board on Science and Technology (BAST) on December 1, 2003 that Network Science become a new Army research area. The BAST, the Division on Engineering and Physical Sciences for the National Research Council (NRC) of the National Academies, serves as a convening authority for the discussion of science and technology issues of importance to the Army and oversees independent Army-related studies conducted by the National Academies. The BAST conducted a study to find out whether identifying and funding a new field of investigation in basic research, Network Science, could help close the gap between what is needed to realize Network-Centric Operations and the current primitive state of fundamental knowledge of networks.
As a result, the BAST issued the NRC study in 2005 titled Network Science (referenced above) that defined a new field of basic research in Network Science for the Army. Based on the findings and recommendations of that study and the subsequent 2007 NRC report titled Strategy for an Army Center for Network Science, Technology, and Experimentation, Army basic research resources were redirected to initiate a new basic research program in Network Science. To build a new theoretical foundation for complex networks, some of the key Network Science research efforts now ongoing in Army laboratories address:
As initiated in 2004 by Frederick I. Moxley with support he solicited from David S. Alberts, the Department of Defense helped to establish the first Network Science Center in conjunction with the U.S. Army at the United States Military Academy (USMA). Under the tutelage of Dr. Moxley and the faculty of the USMA, the first interdisciplinary undergraduate courses in Network Science were taught to cadets at West Point.[3] [4] [5] In order to better instill the tenets of network science among its cadre of future leaders, the USMA has also instituted a five-course undergraduate minor in Network Science. [6]
In 2006, the U.S. Army and the United Kingdom (UK) formed the Network and Information Science International Technology Alliance, a collaborative partnership among the Army Research Laboratory, UK Ministry of Defense and a consortium of industries and universities in the U.S. and UK. The goal of the alliance is to perform basic research in support of Network- Centric Operations across the needs of both nations.
In 2009, the U.S. Army formed the Network Science CTA, a collaborative research alliance among the Army Research Laboratory, CERDEC, and a consortium of about 30 industrial R&D labs and universities in the U.S. The goal of the alliance is to develop a deep understanding of the underlying commonalities among intertwined social/cognitive, information, and communications networks, and as a result improve our ability to analyze, predict, design, and influence complex systems interweaving many kinds of networks.
Subsequently, as a result of these efforts, the U.S. Department of Defense has sponsored numerous research projects that support Network Science.
The definition of deterministic network is defined compared with the definition of probabilistic network. In un-weighted deterministic networks, edges either exist or not, usually we use 0 to represent non-existence of an edge while 1 to represent existence of an edge. In weighted deterministic networks, the edge value represents the weight of each edge, for example, the strength level.
In probabilistic networks, values behind each edge represent the likelihood of the existence of each edge. For example, if one edge has a value equals to 0.9, we say the existence probability of this edge is 0.9.[7]
Often, networks have certain attributes that can be calculated to analyze the properties & characteristics of the network. The behavior of these network properties often define network models and can be used to analyze how certain models contrast to each other. Many of the definitions for other terms used in network science can be found in Glossary of graph theory.
The size of a network can refer to the number of nodes
N
E
N-1
Emax
Emax=\tbinomN2=N(N-1)/2
Emax=N(N-1)
Emax=N2
Emax=infty
The density
D
E
N
D=
E-Emin | |
Emax-Emin |
Emin
Emax
N
Emax
\tbinomN2
Emin=N-1
D=
E-(N-1) | |
Emax-(N-1) |
=
2(E-N+1) | |
N(N-3)+2 |
D=
T-2N+2 | |
N(N-3)+2 |
,
T
The density
D
E
N
(Emax=3N-6)
D=
E-N+1 | |
2N-5 |
.
k
\langlek\rangle=\tfrac{2E}{N}
\langlek\rangle=\tfrac{E}{N}
G(N,p)
\langlek\rangle
k
N-1
p
E[\langlek\rangle]=E[k]=p(N-1)
The average shortest path length is calculated by finding the shortest path between all pairs of nodes, and taking the average over all paths of the length thereof (the length being the number of intermediate edges contained in the path, i.e., the distance
du,v
u,v
N
O(lnN)
O(lnlnN)
As another means of measuring network graphs, we can define the diameter of a network as the longest of all the calculated shortest paths in a network. It is the shortest distance between the two most distant nodes in the network. In other words, once the shortest path length from every node to all other nodes is calculated, the diameter is the longest of all the calculated path lengths. The diameter is representative of the linear size of a network. If node A-B-C-D are connected, going from A->D this would be the diameter of 3 (3-hops, 3-links).
The clustering coefficient is a measure of an "all-my-friends-know-each-other" property. This is sometimes described as the friends of my friends are my friends. More precisely, the clustering coefficient of a node is the ratio of existing links connecting a node's neighbors to each other to the maximum possible number of such links. The clustering coefficient for the entire network is the average of the clustering coefficients of all the nodes. A high clustering coefficient for a network is another indication of a small world.
The clustering coefficient of the
i
Ci={2ei\overki{(ki-1)}},
ki
i
ei
{\binom{k}{2}}={{k(k-1)}\over2}.
From a probabilistic standpoint, the expected local clustering coefficient is the likelihood of a link existing between two arbitrary neighbors of the same node.
The way in which a network is connected plays a large part into how networks are analyzed and interpreted. Networks are classified in four different categories:
See main article: Centrality. Centrality indices produce rankings which seek to identify the most important nodes in a network model. Different centrality indices encode different contexts for the word "importance." The betweenness centrality, for example, considers a node highly important if it form bridges between many other nodes. The eigenvalue centrality, in contrast, considers a node highly important if many other highly important nodes link to it. Hundreds of such measures have been proposed in the literature.
Centrality indices are only accurate for identifying the most important nodes. The measures are seldom, if ever, meaningful for the remainder of network nodes.[9] [10] Also, their indications are only accurate within their assumed context for importance, and tend to "get it wrong" for other contexts.[11] For example, imagine two separate communities whose only link is an edge between the most junior member of each community. Since any transfer from one community to the other must go over this link, the two junior members will have high betweenness centrality. But, since they are junior, (presumably) they have few connections to the "important" nodes in their community, meaning their eigenvalue centrality would be quite low.
See main article: Node influence metric. Limitations to centrality measures have led to the development of more general measures. Two examples arethe accessibility, which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node,[12] and the expected force, derived from the expected value of the force of infection generated by a node.[9] Both of these measures can be meaningfully computed from the structure of the network alone.
See main article: Community structure. Nodes in a network may be partitioned into groups representing communities. Depending on the context, communities may be distinct or overlapping. Typically, nodes in such communities will be strongly connected to other nodes in the same community, but weakly connected to nodes outside the community. In the absence of a ground truth describing the community structure of a specific network, several algorithms have been developed to infer possible community structures using either supervised of unsupervised clustering methods.
Network models serve as a foundation to understanding interactions within empirical complex networks. Various random graph generation models produce network structures that may be used in comparison to real-world complex networks.
The Erdős–Rényi model, named for Paul Erdős and Alfréd Rényi, is used for generating random graphs in which edges are set between nodes with equal probabilities. It can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs.
To generate an Erdős–Rényi model
G(n,p)
Because the model is generated without bias to particular nodes, the degree distribution is binomial: for a randomly chosen vertex
v
P(\deg(v)=k)={n-1\choosek}pk(1-p)n-1-k.
In this model the clustering coefficient is a.s. The behavior of
G(n,p)
Subcritical
np<1
|C1|=O(logn)
Critical
np=1
|C1|=
| ||||
O(n |
)
Supercritical
np>1
|C1| ≈ yn
y=y(np)
e-p=1-y
The largest connected component has high complexity. All other components are simple and small
|C2|=O(logn)
The configuration model takes a degree sequence[13] [14] or degree distribution[15] (which subsequently is used to generate a degree sequence) as the input, and produces randomly connected graphs in all respects other than the degree sequence. This means that for a given choice of the degree sequence, the graph is chosen uniformly at random from the set of all graphs that comply with this degree sequence. The degree
k
w(n)
n
u(k)
u | ||||
|
pc
p | ||||
|
l
l=O(logN)
In the directed configuration model, the degree of a node is given by two numbers, in-degree
kin
kout
n
h | ||||
|
\tilde
*n | |
u | |
out |
(n-2), n>1, \tilde
u | ||||
|
\sum\limits | |
kin\geq0 |
u(kin,kout+1),
for out-components.
The Watts and Strogatz model is a random graph generation model that produces graphs with small-world properties.
An initial lattice structure is used to generate a Watts–Strogatz model. Each node in the network is initially linked to its
\langlek\rangle
p
pE=pN\langlek\rangle/2
As the Watts–Strogatz model begins as non-random lattice structure, it has a very high clustering coefficient along with high average path length. Each rewire is likely to create a shortcut between highly connected clusters. As the rewiring probability increases, the clustering coefficient decreases slower than the average path length. In effect, this allows the average path length of the network to decrease significantly with only slightly decreases in clustering coefficient. Higher values of p force more rewired edges, which in effect makes the Watts–Strogatz model a random network.
The Barabási–Albert model is a random network model used to demonstrate a preferential attachment or a "rich-get-richer" effect. In this model, an edge is most likely to attach to nodes with higher degrees.The network begins with an initial network of m0 nodes. m0 ≥ 2 and the degree of each node in the initial network should be at least 1, otherwise it will always remain disconnected from the rest of the network.
In the BA model, new nodes are added to the network one at a time. Each new node is connected to
m
pi=
ki | |
\sumjkj |
,
where ki is the degree of node i. Heavily linked nodes ("hubs") tend to quickly accumulate even more links, while nodes with only a few links are unlikely to be chosen as the destination for a new link. The new nodes have a "preference" to attach themselves to the already heavily linked nodes.
The degree distribution resulting from the BA model is scale free, in particular, for large degree it is a power law of the form:
P(k)\simk-3
Hubs exhibit high betweenness centrality which allows short paths to exist between nodes. As a result, the BA model tends to have very short average path lengths. The clustering coefficient of this model also tends to 0.
The Barabási–Albert model was developed for undirected networks, aiming to explain the universality of the scale-free property, and applied to a wide range of different networks and applications. The directed version of this model is the Price model[21] [22] which was developed to just citation networks.
See main article: Non-linear preferential attachment.
In non-linear preferential attachment (NLPA), existing nodes in the network gain new edges proportionally to the node degree raised to a constant positive power,
\alpha
i
pi=
| ||||||||||||
|
.
If
\alpha=1
0<\alpha<1
\alpha>1
\alpha<1
\alpha>1
\alpha
1
In the mediation-driven attachment (MDA) model in which a new node coming with
m
m
\Pi(i)
i
\Pi(i)=
ki | |
N |
| ||||||||||||
ki |
.
The factor
| |||||||||||||
ki
i
m>14
N
\Pi(i)\proptoki
However, for
m=1
99\%
m
m>14
Another model where the key ingredient is the nature of the vertex has been introduced by Caldarelli et al.[26] Here a link is created between two vertices
i,j
f(ηi,ηj)
k(ηi)=N\int
infty | |
0 |
f(ηi,ηj)\rho(ηj)dηj
If
k(ηi)
ηi
P(k)
P(k)=\rho(η(k)) ⋅ η'(k)
As a result, if the fitnesses
η
Less intuitively with a fast decaying probability distribution as
\rho(η)=e-η
f(ηi,ηj)=\Theta(ηi+ηj-Z)
with
Z
\Theta
Such model has been successfully applied to describe trade between nations by using GDP as fitness for the various nodes
i,j
\deltaηiηj | |
1+\deltaηiηj |
.
Exponential Random Graph Models (ERGMs) are a family of statistical models for analyzing data from social and other networks.[30] The Exponential family is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks.
Y\inl{Y}
n
\{Yij:i=1,...,n;j=1,...,n\}
ij
Yij=1
(i,j)
Yij=0
The basic assumption of ERGMs is that the structure in an observed graph
y
s(y)
y\inl{Y}
P(Y=y|\theta)=
\exp(\thetaTs(y)) | |
c(\theta) |
where
\theta
s(y)
c(\theta)=\sumy'\inl{Y
Social network analysis examines the structure of relationships between social entities.[31] These entities are often persons, but may also be groups, organizations, nation states, web sites, scholarly publications.
Since the 1970s, the empirical study of networks has played a central role in social science, and many of the mathematical and statistical tools used for studying networks have been first developed in sociology.[32] Amongst many other applications, social network analysis has been used to understand the diffusion of innovations,[33] news and rumors. Similarly, it has been used to examine the spread of both diseases and health-related behaviors. It has also been applied to the study of markets, where it has been used to examine the role of trust in exchange relationships and of social mechanisms in setting prices. Similarly, it has been used to study recruitment into political movements and social organizations. It has also been used to conceptualize scientific disagreements as well as academic prestige. In the second language acquisition literature, it has an established history in study abroad research, revealing how peer learner interaction networks influence their language progress.[34] More recently, network analysis (and its close cousin traffic analysis) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and leaderless nature.[35] In criminology, it is being used to identify influential actors in criminal gangs, offender movements, co-offending, predict criminal activities and make policies.[36]
Dynamic network analysis examines the shifting structure of relationships among different classes of entities in complex socio-technical systems effects, and reflects social stability and changes such as the emergence of new groups, topics, and leaders.[37] [38] [39] Dynamic Network Analysis focuses on meta-networks composed of multiple types of nodes (entities) and multiple types of links. These entities can be highly varied. Examples include people, organizations, topics, resources, tasks, events, locations, and beliefs.
Dynamic network techniques are particularly useful for assessing trends and changes in networks over time, identification of emergent leaders, and examining the co-evolution of people and ideas.
With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest. The type of analysis in this content are closely related to social network analysis, but often focusing on local patterns in the network. For example, network motifs are small subgraphs that are over-represented in the network. Activity motifs are similar over-represented patterns in the attributes of nodes and edges in the network that are over represented given the network structure. The analysis of biological networks has led to the development of network medicine, which looks at the effect of diseases in the interactome.[40]
Semantic network analysis is a sub-field of network analysis that focuses on the relationships between words and concepts in a network. Words are represented as nodes and their proximity or co-occurrences in the text are represented as edges. Semantic networks are therefore graphical representations of knowledge and are commonly used in neurolinguistics and natural language processing applications. Semantic network analysis is also used as a method to analyze large texts and identify the main themes and topics (e.g., of social media posts), to reveal biases (e.g., in news coverage), or even to map an entire research field.[41]
Link analysis is a subset of network analysis, exploring associations between objects. An example may be examining the addresses of suspects and victims, the telephone numbers they have dialed and financial transactions that they have partaken in during a given timeframe, and the familial relationships between these subjects as a part of police investigation. Link analysis here provides the crucial relationships and associations between very many objects of different types that are not apparent from isolated pieces of information. Computer-assisted or fully automatic computer-based link analysis is increasingly employed by banks and insurance agencies in fraud detection, by telecommunication operators in telecommunication network analysis, by medical sector in epidemiology and pharmacology, in law enforcement investigations, by search engines for relevance rating (and conversely by the spammers for spamdexing and by business owners for search engine optimization), and everywhere else where relationships between many objects have to be analyzed.
The SIR model is one of the most well known algorithms on predicting the spread of global pandemics within an infectious population.
S=\beta\left(
1 | |
N |
\right)
The formula above describes the "force" of infection for each susceptible unit in an infectious population, where is equivalent to the transmission rate of said disease.
To track the change of those susceptible in an infectious population:
\DeltaS=\beta x S{1\overN}\Deltat
\DeltaI=\muI\Deltat
Over time, the number of those infected fluctuates by: the specified rate of recovery, represented by
\mu
{1\over\tau}
I
\Deltat
Whether a population will be overcome by a pandemic, with regards to the SIR model, is dependent on the value of
R0
R0=\beta\tau={\beta\over\mu}
Several Web search ranking algorithms use link-based centrality metrics, including (in order of appearance) Marchiori's Hyper Search, Google's PageRank, Kleinberg's HITS algorithm, the CheiRank and TrustRank algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example, the analysis might be of the interlinking between politicians' web sites or blogs.
PageRank works by randomly picking "nodes" or websites and then with a certain probability, "randomly jumping" to other nodes. By randomly jumping to these other nodes, it helps PageRank completely traverse the network as some webpages exist on the periphery and would not as readily be assessed.
Each node,
xi
j
i
j
j
xi=\sumj → {1\overNj}x
(k) | |
j |
=As explained above, PageRank enlists random jumps in attempts to assign PageRank to every website on the internet. These random jumps find websites that might not be found during the normal search methodologies such as breadth-first search and depth-first search.
In an improvement over the aforementioned formula for determining PageRank includes adding these random jump components. Without the random jumps, some pages would receive a PageRank of 0 which would not be good.
The first is
\alpha
1-\alpha
R{(p)}={\alpha\overN}+(1-\alpha)\sumj → {1\overNj}
(k) | |
x | |
j |
Another way of looking at it:
R(A)=\sum{RB\overB(outlinks)
Information about the relative importance of nodes and edges in a graph can be obtained through centrality measures, widely used in disciplines like sociology. Centrality measures are essential when a network analysis has to answer questions such as: "Which nodes in the network should be targeted to ensure that a message or information spreads to all or most nodes in the network?" or conversely, "Which nodes should be targeted to curtail the spread of a disease?". Formally established measures of centrality are degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and katz centrality. The objective of network analysis generally determines the type of centrality measure(s) to be used.[31]
Content in a complex network can spread via two major methods: conserved spread and non-conserved spread.[42] In conserved spread, the total amount of content that enters a complex network remains constant as it passes through. The model of conserved spread can best be represented by a pitcher containing a fixed amount of water being poured into a series of funnels connected by tubes. Here, the pitcher represents the original source and the water is the content being spread. The funnels and connecting tubing represent the nodes and the connections between nodes, respectively. As the water passes from one funnel into another, the water disappears instantly from the funnel that was previously exposed to the water. In non-conserved spread, the amount of content changes as it enters and passes through a complex network. The model of non-conserved spread can best be represented by a continuously running faucet running through a series of funnels connected by tubes. Here, the amount of water from the original source is infinite. Also, any funnels that have been exposed to the water continue to experience the water even as it passes into successive funnels. The non-conserved model is the most suitable for explaining the transmission of most infectious diseases.
In 1927, W. O. Kermack and A. G. McKendrick created a model in which they considered a fixed population with only three compartments, susceptible:
S(t)
I(t)
R(t)
S(t)
I(t)
R(t)
The flow of this model may be considered as follows:
l{S} → l{I} → l{R}
Using a fixed population,
N=S(t)+I(t)+R(t)
\begin{align} | dS |
dt |
&=-\betaSI\\[8pt]
dI | |
dt |
&=\betaSI-\gammaI\\[8pt]
dR | |
dt |
&=\gammaI \end{align}
Several assumptions were made in the formulation of these equations: First, an individual in the population must be considered as having an equal probability as every other individual of contracting the disease with a rate of
\beta
\betaN
S/N
\betaN(S/N)
\betaN(S/N)I=\betaSI
\gamma
\gamma
1/\gamma
More can be read on this model on the Epidemic model page.
A master equation can express the behaviour of an undirected growing network where, at each time step, a new node is added to the network, linked to an old node (randomly chosen and without preference). The initial network is formed by two nodes and two links between them at time
t=2
t=n
n
n
The master equation for this network is:
p(k,s,t+1)=
1 | |
t |
p(k-1,s,t)+\left(1-
1 | |
t |
\right)p(k,s,t),
where
p(k,s,t)
s
k
t+1
s
s
k
t+1
s
k-1
t
1/t
k
t
After simplifying this model, the degree distribution is
P(k)=2-k.
Based on this growing network, an epidemic model is developed following a simple rule: Each time the new node is added and after choosing the old node to link, a decision is made: whether or not this new node will be infected. The master equation for this epidemic model is:
pr(k,s,t)=rt
1 | |
t |
pr(k-1,s,t)+\left(1-
1 | |
t |
\right)pr(k,s,t),
where
rt
rt=1
rt=0
\tilde{P}r(k)=\left(
r | |
2 |
\right)k.
See main article: Multidimensional network.
Multilayer networks are networks with multiple kinds of relations.[45] Attempts to model real-world systems as multidimensional networks have been used in various fields such as social network analysis,[46] economics, history, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.
Network problems that involve finding an optimal way of doing something are studied under the name of combinatorial optimization. Examples include network flow, shortest path problem, transport problem, transshipment problem, location problem, matching problem, assignment problem, packing problem, routing problem, critical path analysis and PERT (Program Evaluation & Review Technique).
Interdependent networks are networks where the functioning of nodes in one network depends on the functioning of nodes in another network. In nature, networks rarely appear in isolation, rather, usually networks are typically elements in larger systems, and interact with elements in that complex system. Such complex dependencies can have non-trivial effects on one another. A well studied example is the interdependency of infrastructure networks,[47] the power stations which form the nodes of the power grid require fuel delivered via a network of roads or pipes and are also controlled via the nodes of communications network. Though the transportation network does not depend on the power network to function, the communications network does. In such infrastructure networks, the disfunction of a critical number of nodes in either the power network or the communication network can lead to cascading failures across the system with potentially catastrophic result to the whole system functioning.[48] If the two networks were treated in isolation, this important feedback effect would not be seen and predictions of network robustness would be greatly overestimated.