A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities.[1] In general, networks or graphs are used to capture relationships between entities or objects. A typical graphing representation consists of a set of nodes connected by edges.
As early as 1736 Leonhard Euler analyzed a real-world issue known as the Seven Bridges of Königsberg, which established the foundation of graph theory. From the 1930s-1950s the study of random graphs were developed. During the mid 1990s, it was discovered that many different types of "real" networks have structural properties quite different from random networks.[2] In the late 2000's, scale-free and small-world networks began shaping the emergence of systems biology, network biology, and network medicine.https://www.frontiersin.org/articles/10.3389/fgene.2015.00307/full#B18 In 2014, graph theoretical methods were used by Frank Emmert-Streib to analyze biological networks.
In the 1980s, researchers started viewing DNA or genomes as the dynamic storage of a language system with precise computable finite states represented as a finite state machine.[3] Recent complex systems research has also suggested some far-reaching commonality in the organization of information in problems from biology, computer science, and physics.
See main article: article and interactome. Protein-protein interaction networks (PINs) represent the physical relationship among proteins present in a cell, where proteins are nodes, and their interactions are undirected edges.[4] Due to their undirected nature, it is difficult to identify all the proteins involved in an interaction. Protein–protein interactions (PPIs) are essential to the cellular processes and also the most intensely analyzed networks in biology. PPIs could be discovered by various experimental techniques, among which the yeast two-hybrid system is a commonly used technique for the study of binary interactions.[5] Recently, high-throughput studies using mass spectrometry have identified large sets of protein interactions.[6]
Many international efforts have resulted in databases that catalog experimentally determined protein-protein interactions. Some of them are the Human Protein Reference Database, Database of Interacting Proteins, the Molecular Interaction Database (MINT),[7] IntAct,[8] and BioGRID.[9] At the same time, multiple computational approaches have been proposed to predict interactions.[10] FunCoup and STRING are examples of such databases, where protein-protein interactions inferred from multiple evidences are gathered and made available for public usage. Recent studies have indicated the conservation of molecular networks through deep evolutionary time.[11] Moreover, it has been discovered that proteins with high degrees of connectedness are more likely to be essential for survival than proteins with lesser degrees.[12] This observation suggests that the overall composition of the network (not simply interactions between protein pairs) is vital for an organism's overall functioning.
See main article: article and Gene regulatory network. The genome encodes thousands of genes whose products (mRNAs, proteins) are crucial to the various processes of life, such as cell differentiation, cell survival, and metabolism. Genes produce such products through a process called transcription, which is regulated by a class of proteins called transcription factors. For instance, the human genome encodes almost 1,500 DNA-binding transcription factors that regulate the expression of more than 20,000 human genes.[13] The complete set of gene products and the interactions among them constitutes gene regulatory networks (GRN). GRNs regulate the levels of gene products within the cell and in-turn the cellular processes.
GRNs are represented with genes and transcriptional factors as nodes and the relationship between them as edges. These edges are directional, representing the regulatory relationship between the two ends of the edge. For example., the directed edge from gene A to gene B indicates that A regulates the expression of B. Thus, these directional edges can not only represent the promotion of gene regulation but also its inhibition.
GRNs are usually constructed by utilizing the gene regulation knowledge available from databases such as., Reactome and KEGG. High-throughput measurement technologies, such as microarray, RNA-Seq, ChIP-chip, and ChIP-seq, enabled the accumulation of large-scale transcriptomics data, which could help in understanding the complex gene regulation patterns.[14] [15]
See main article: article and Gene co-expression networks. Gene co-expression networks can be perceived as association networks between variables that measure transcript abundances. These networks have been used to provide a system biologic analysis of DNA microarray data, RNA-seq data, miRNA data, etc. weighted gene co-expression network analysis is extensively used to identify co-expression modules and intramodular hub genes.[16] Co-expression modules may correspond to cell types or pathways, while highly connected intramodular hubs can be interpreted as representatives of their respective modules.
See main article: article and metabolic network. Cells break down the food and nutrients into small molecules necessary for cellular processing through a series of biochemical reactions. These biochemical reactions are catalyzed by enzymes. The complete set of all these biochemical reactions in all the pathways represents the metabolic network. Within the metabolic network, the small molecules take the roles of nodes, and they could be either carbohydrates, lipids, or amino acids. The reactions which convert these small molecules from one form to another are represented as edges. It is possible to use network analyses to infer how selection acts on metabolic pathways.[17]
See main article: article and Cell signaling. Signals are transduced within cells or in between cells and thus form complex signaling networks which plays a key role in the tissue structure. For instance, the MAPK/ERK pathway is transduced from the cell surface to the cell nucleus by a series of protein-protein interactions, phosphorylation reactions, and other events.[18] Signaling networks typically integrate protein–protein interaction networks, gene regulatory networks, and metabolic networks.[19] [20] Single cell sequencing technologies allows the extraction of inter-cellular signaling, an example is NicheNet, which allows to modeling intercellular communication by linking ligands to target genes.[21]
See main article: article and Biological neural network. The complex interactions in the brain make it a perfect candidate to apply network theory. Neurons in the brain are deeply connected with one another, and this results in complex networks being present in the structural and functional aspects of the brain.[22] For instance, small-world network properties have been demonstrated in connections between cortical regions of the primate brain[23] or during swallowing in humans.[24] This suggests that cortical areas of the brain are not directly interacting with each other, but most areas can be reached from all others through only a few interactions.
See main article: article and Food web. All organisms are connected through feeding interactions. If a species eats or is eaten by another species, they are connected in an intricate food web of predator and prey interactions. The stability of these interactions has been a long-standing question in ecology.[25] That is to say if certain individuals are removed, what happens to the network (i.e., does it collapse or adapt)? Network analysis can be used to explore food web stability and determine if certain network properties result in more stable networks. Moreover, network analysis can be used to determine how selective removals of species will influence the food web as a whole.[26] This is especially important considering the potential species loss due to global climate change.
In biology, pairwise interactions have historically been the focus of intense study. With the recent advances in network science, it has become possible to scale up pairwise interactions to include individuals of many species involved in many sets of interactions to understand the structure and function of larger ecological networks.[27] The use of network analysis can allow for both the discovery and understanding of how these complex interactions link together within the system's network, a property that has previously been overlooked. This powerful tool allows for the study of various types of interactions (from competitive to cooperative) using the same general framework.[28] For example, plant-pollinator interactions are mutually beneficial and often involve many different species of pollinators as well as many different species of plants. These interactions are critical to plant reproduction and thus the accumulation of resources at the base of the food chain for primary consumers, yet these interaction networks are threatened by anthropogenic change. The use of network analysis can illuminate how pollination networks work and may, in turn, inform conservation efforts.[29] Within pollination networks, nestedness (i.e., specialists interact with a subset of species that generalists interact with), redundancy (i.e., most plants are pollinated by many pollinators), and modularity play a large role in network stability.[29] [30] These network properties may actually work to slow the spread of disturbance effects through the system and potentially buffer the pollination network from anthropogenic changes somewhat.[30] More generally, the structure of species interactions within an ecological network can tell us something about the diversity, richness, and robustness of the network.[31] Researchers can even compare current constructions of species interactions networks with historical reconstructions of ancient networks to determine how networks have changed over time.[32] Much research into these complex species interactions networks is highly concerned with understanding what factors (e.g., species richness, connectance, nature of the physical environment) lead to network stability.[33] [34]
Network analysis provides the ability to quantify associations between individuals, which makes it possible to infer details about the network as a whole at the species and/or population level.[35] One of the most attractive features of the network paradigm would be that it provides a single conceptual framework in which the social organization of animals at all levels (individual, dyad, group, population) and for all types of interaction (aggressive, cooperative, sexual, etc.) can be studied.[36]
Researchers interested in ethology across many taxa, from insects to primates, are starting to incorporate network analysis into their research. Researchers interested in social insects (e.g., ants and bees) have used network analyses better to understand the division of labor, task allocation, and foraging optimization within colonies.[37] [38] [39] Other researchers are interested in how specific network properties at the group and/or population level can explain individual-level behaviors. Studies have demonstrated how animal social network structure can be influenced by factors ranging from characteristics of the environment to characteristics of the individual, such as developmental experience and personality. At the level of the individual, the patterning of social connections can be an important determinant of fitness, predicting both survival and reproductive success. At the population level, network structure can influence the patterning of ecological and evolutionary processes, such as frequency-dependent selection and disease and information transmission.[40] For instance, a study on wire-tailed manakins (a small passerine bird) found that a male's degree in the network largely predicted the ability of the male to rise in the social hierarchy (i.e., eventually obtain a territory and matings).[41] In bottlenose dolphin groups, an individual's degree and betweenness centrality values may predict whether or not that individual will exhibit certain behaviors, like the use of side flopping and upside-down lobtailing to lead group traveling efforts; individuals with high betweenness values are more connected and can obtain more information, and thus are better suited to lead group travel and therefore tend to exhibit these signaling behaviors more than other group members.[42]
Social network analysis can also be used to describe the social organization within a species more generally, which frequently reveals important proximate mechanisms promoting the use of certain behavioral strategies. These descriptions are frequently linked to ecological properties (e.g., resource distribution). For example, network analyses revealed subtle differences in the group dynamics of two related equid fission-fusion species, Grevy's zebra and onagers, living in variable environments; Grevy's zebras show distinct preferences in their association choices when they fission into smaller groups, whereas onagers do not.[43] Similarly, researchers interested in primates have also utilized network analyses to compare social organizations across the diverse primate order, suggesting that using network measures (such as centrality, assortativity, modularity, and betweenness) may be useful in terms of explaining the types of social behaviors we see within certain groups and not others.[44]
Finally, social network analysis can also reveal important fluctuations in animal behaviors across changing environments. For example, network analyses in female chacma baboons (Papio hamadryas ursinus) revealed important dynamic changes across seasons that were previously unknown; instead of creating stable, long-lasting social bonds with friends, baboons were found to exhibit more variable relationships which were dependent on short-term contingencies related to group-level dynamics as well as environmental variability.[45] Changes in an individual's social network environment can also influence characteristics such as 'personality': for example, social spiders that huddle with bolder neighbors tend to increase also in boldness.[46] This is a very small set of broad examples of how researchers can use network analysis to study animal behavior. Research in this area is currently expanding very rapidly, especially since the broader development of animal-borne tags and computer vision can be used to automate the collection of social associations.[47] Social network analysis is a valuable tool for studying animal behavior across all animal species and has the potential to uncover new information about animal behavior and social ecology that was previously poorly understood.
Within a nucleus, DNA is constantly in motion. Perpetual actions such as genome folding and Cohesin extrusion morph the shape of a genome in real time. The spatial location of strands of chromatin relative to each other plays an important role in the activation or suppression of certain genes. DNA-DNA Chromatin Networks help biologists to understand these interactions by analyzing commonalities amongst different loci. The size of a network can vary significantly, from a few genes to several thousand and thus network analysis can provide vital support in understanding relationships among different areas of the genome. As an example, analysis of spatially similar loci within the organization in a nucleus with Genome Architecture Mapping (GAM) can be used to construct a network of loci with edges representing highly linked genomic regions.
The first graphic showcases the Hist1 region of the mm9 mouse genome with each node representing genomic loci. Two nodes are connected by an edge if their linkage disequilibrium is greater than the average across all 81 genomic windows. The locations of the nodes within the graphic are randomly selected and the methodology of choosing edges yields a, simple to show, but rudimentary graphical representation of the relationships in the dataset. The second visual exemplifies the same information as the previous; However, the network starts with every loci placed sequentially in a ring configuration. It then pulls nodes together using linear interpolation by their linkage as a percentage. The figure illustrates strong connections between the center genomic windows as well as the edge loci at the beginning and end of the Hist1 region.
To draw useful information from a biological network, an understanding of the statistical and mathematical techniques of identifying relationships within a network is vital. Procedures to identify association, communities, and centrality within nodes in a biological network can provide insight into the relationships of whatever the nodes represent whether they are genes, species, etc. Formulation of these methods transcends disciplines and relies heavily on graph theory, computer science, and bioinformatics.
There are many different ways to measure the relationships of nodes when analyzing a network. In many cases, the measure used to find nodes that share similarity within a network is specific to the application it is being used. One of the types of measures that biologists utilize is correlation which specifically centers around the linear relationship between two variables.[48] As an example, weighted gene co-expression network analysis uses Pearson correlation to analyze linked gene expression and understand genetics at a systems level.[49] Another measure of correlation is linkage disequilibrium. Linkage disequilibrium describes the non-random association of genetic sequences among loci in a given chromosome.[50] An example of its use is in detecting relationships in GAM data across genomic intervals based upon detection frequencies of certain loci.[51]
The concept of centrality can be extremely useful when analyzing biological network structures. There are many different methods to measure centrality such as betweenness, degree, Eigenvector, and Katz centrality. Every type of centrality technique can provide different insights on nodes in a particular network; However, they all share the commonality that they are to measure the prominence of a node in a network.[52] In 2005, Researchers at Harvard Medical School utilized centrality measures with the yeast protein interaction network. They found that proteins that exhibited high Betweenness centrality were more essential and translated closely to a given protein's evolutionary age.[53]
Studying the community structure of a network by subdividing groups of nodes into like-regions can be an integral tool for bioinformatics when exploring data as a network.[54] A food web of The Secaucus High School Marsh exemplifies the benefits of grouping as the relationships between nodes are far easier to analyze with well-made communities. While the first graphic is hard to visualize, the second provides a better view of the pockets of highly connected feeding relationships that would be expected in a food web. The problem of community detection is still an active problem. Scientists and graph theorists continuously discover new ways of sub sectioning networks and thus a plethora of different algorithms exist for creating these relationships.[55] Like many other tools that biologists utilize to understand data with network models, every algorithm can provide its own unique insight and may vary widely on aspects such as accuracy or time complexity of calculation.In 2002, a food web of marine mammals in the Chesapeake Bay was divided into communities by biologists using a community detection algorithm based on neighbors of nodes with high degree centrality. The resulting communities displayed a sizable split in pelagic and benthic organisms.[56] Two very common community detection algorithms for biological networks are the Louvain Method and Leiden Algorithm.
The Louvain method is a greedy algorithm that attempts to maximize modularity, which favors heavy edges within communities and sparse edges between, within a set of nodes. The algorithm starts by each node being in its own community and iteratively being added to the particular node's community that favors a higher modularity.[57] [58] Once no modularity increase can occur by joining nodes to a community, a new weighted network is constructed of communities as nodes with edges representing between-community edges and loops representing edges within a community. The process continues until no increase in modularity occurs.[59] While the Louvain Method provides good community detection, there are a few ways that it is limited. By mainly focusing on maximizing a given measure of modularity, it may be led to craft badly connected communities by degrading a model for the sake of maximizing a modularity metric; However, the Louvain Method performs fairly and is can be easy to understand comparatively to many other community detection algorithms.[58]
The Leiden Algorithm expands on the Louvain Method by providing a number of improvements. When joining nodes to a community, only neighborhoods that have been recently changed are considered. This greatly improves the speed of merging nodes. Another optimization is in the refinement phase in-which the algorithm randomly chooses for a node from a set of communities to merge with. This allows for greater depth in choosing communities as Louvain solely focuses on maximizing the modularity that was chosen. The Leiden algorithm, while more complex than Louvain, performs faster with better community detection and can be a valuable tool for identifying groups.[58]
Network motifs, or statistically significant recurring interaction patterns within a network, are a commonly used tool to understand biological networks. A major use case of network motifs is in Neurophysiology where motif analysis is commonly used to understand interconnected neuronal functions at varying scales. [60] As an example, in 2017, researchers at Beijing Normal University analyzed highly represented 2 and 3 node network motifs in directed functional brain networks constructed by Resting state fMRI data to study the basic mechanisms in brain information flow.[61]