Intro to Network Science and Graph Theory

Network Science Introduction

We live in a world of networks. The transportation infrastructure, the electricity, and water distribution systems or the internet are examples of networks we all are familiar with. Our family trees, the social interactions we have with friends and coworkers, as well as all the shows watched by the members of a household through streaming services are instances of networks. In general, whenever we encounter a group of entities related by some sort of common interaction, we are in the presence of a network.

Notwithstanding how different they might seem from one another in real life, all networks can be studied with the same theoretical framework known as Network Science. This approach, which is rather young in the history of science, encompasses theory, insights, perspectives, and techniques from a broad range of disciplines such as graph theory, physics, computer science, sociology, biology, and engineering, to provide a unified view of networks as objects of scientific inquiry in their own right.

The Study of Networks

The study of networks begins with the study of its structure. The mathematical formalism to represent a network is provided by a branch of mathematics known as graph theory and is called a graph. A graph G is comprised of a collection V of vertices or nodes v and another collection E of edges or links e that account for all existing pairwise interactions between vertices. Such interactions can be undirected, when the nodes at both ends of an edge act upon each other in the same way (Fig. 1 left); or directed, when the interaction can only happen in one direction (Fig. 1 right). Moreover, it is possible to have networks with directed and undirected links at the same time.

network science example of undirected (left) and directed (right) graphs — *Figure 1. Example of undirected (left) and directed (right) graphs.*

Figure 1 depicts two examples of networks with the same set of vertices V={v₁, v₂, v₃, v₄, v₅} but a different type of edge. The one on the left is an undirected graph, because all its links are undirected. This means that regardless of what type of relationship a link represents, it can be equally predicated of both ends of such link. For example, if the links represent streets, we see we can drive from v₂ to v₅ and back from v₅ to v₂ using the same road. The same cannot be said of the graph on the right. Since it is a directed graph, its edges are now one-way streets. A trip from v₅ to v₂ is allowed, but the trip back requires traversing a much larger path visiting nodes v₂, v₁, v₃, v₄, v₅.

Examples of both undirected and directed networks abound in the social sphere, depending on whether the relationships between individuals are symmetric or asymmetric. Examples of the former are family and friendship ties, network of coworkers or someone’s connections on LinkedIn. All these relationships are symmetric and therefore better represented by undirected graphs. By contrast, other phenomena like gene or disease transmission, author citation in academic literature, or following someone on Twitter, constitute examples of unidirectional relationships that require directed graphs for their representation.

In general, when we approach networks, there is a list of important properties that can be learned from their graphs. Such properties are usually quantified by a) counting something, and b) characterizing the thing counted statistically. Let us look at some of them along with some examples to illuminate how this knowledge can be useful in many practical applications.

Node and Link Counts and Distributions

Counting in a network can happen at different scales. At a macroscopic level, we describe a network by looking at its total number of nodes (network size) and total number of links. We can also get an idea of how dense or sparse the network is by comparing its actual number of links with the same quantity if the network were fully connected. By contrast, at the microscopic level we characterize nodes by their number of links, or node degree. Furthermore, by counting to what degree the neighbors of a node connect to each other we get a local density measure called clustering degree.

The next step in characterizing a network is to define statistics from the measures taken at intermediate or microscopic levels. This is done by calculating averages and determining frequency distributions. A typical example is the average node degree, which gives us an idea of how many links we should expect to see if we pick a node at random, and the node degree distribution which paints a complete picture of how connected the nodes really are in a network. One of the most exciting discoveries of network science is the fact that in many real networks the node degree follows a power law distribution. This implies that, while most of the nodes in the network have only a small number of connections, there is a handful of them, called hubs, that are connected to a very large portion of the network.

In social networks, the node degree and the clustering coefficients are particularly important for understanding and predicting how things get spread. Marketers may use these measures to apprise the cost effectiveness of hiring a particular individual to post a message online. In a different context, these same measures can be used to devise protection policies against a virus or an intentional attack in case of population or critical infrastructure protection when the protective resources are limited. A search engine for airflights can exploit the fact that some airports are hubs to reduce the search times and rank existing itineraries.

Path-Based Metrics

Another very important aspect of studying networks is the analysis of paths. A path is a sequence of links connecting one node to another. Between any two nodes there may exist none, one or many paths. Inspecting Fig. 2 we see that there is no path between nodes v₁ and v₅, one path between v₁ and v₃, and multiple two paths between v₁ and v₂.

network science example of a disconnected undirected graph — *Figure 2. Example of a disconnected undirected graph.*

By looking at paths and path lengths (number of edges in the path) between all possible pairs of nodes in a graph, we can characterize its connectivity and how easy it is to reach one section of the network from another. Networks can be connected (Fig. 1) or disconnected (Fig. 2) depending on whether there exists at least one path for each pair of their nodes. Whenever multiple paths connect a given pair of nodes, we often care about the shortest one. For instance, the paths linking v₁ and v₂ in Fig. 2 are v₁→v₃→v₂→v₄ (length of 3) and v₁→v₃→v₂ (length of 2), so we retain the latter.

The set of all shortest paths allows us to define a distribution of path lengths from which we can learn the expected distance to reach one node from another (average path length) and the worst-case scenario (longest shortest path or network diameter).

If a node occurs in many different shortest paths, its removal could cause a very important degradation of the network connectivity. This fact can be employed to devise protection, maintenance, and recovery policies. Moreover, when designing a network, a possible strategy is to ensure that most of the nodes are connected by a set of paths with none or very few nodes in common, thereby boosting the robustness of the whole network.

Weighed Graphs in Network Science

Sometimes the links bear some property that is important to account for while studying the network. For example, the links in distribution and transportation networks usually map to some sort of tangible connections (a cable, a pipeline, a street, etc.) with physical and operational capacity constraints. In these and similar cases, the property of interest is mapped to a set of weights that accompany the edges of a graph.

network science example of a weighted graph — *Figure 3. Example of a weighed graph.*

We can see an example of a weighed graph in Fig. 3. Suppose the weights represent distance; the path v₁→v₃→v₄→v₂ is the shortest route from v₁ to v₂ because its total distance of 5+1+1=7 units, which is smaller than v₁→v₃→v₂ with distance 5+3=8 units. If, on the other hand, the weights represent capacity, the maximum flow that can be transmitted to v₂ from v₁ is 4 if the load can be split (3 units along path v₁→v₃→v₂ plus 1 unit along path v₁→v₃→v₄→v₂), or 3 otherwise. These aspects are key to understanding the robustness of distribution networks and supply chains, where sending and transferring loads could take place through various paths.

Stochastic Aspects of Network Science

In the previous example the weights are fixed, but in real networks, they may represent random variables. Consider the time it takes to get from your home to the nearest airport or some sports venue. We all expect that time to increase significantly during rush hour since there are more drivers using the system, but we cannot predict with certainly how much the time will increase – or when an accident might slow traffic flow even further. Thus, if we wanted to model traffic through the network, or any other form of network demand scenario, we must carefully define probability distributions for the link properties we care about and perform simulations to estimate the throughput patterns of our network.

Link capacity or travel time are not the only aspects that require randomness to be accounted for. Our knowledge of the network structure may not be precise, thus necessitating links to be assigned a probability of existence. Network Scientists have studied some of these aspects in what is called random networks, but practical applications often require the use of Monte Carlo simulations to answer questions pertaining to the stochastic aspect of networks.

Networks of Networks

Before coming to an end, let us make a quick mention of networks of networks. Many of our interconnected systems can be modeled as a network of networks, where a sub-network of some nature is connected to another sub-network of a different kind.

Our water distribution system requires electricity to work and a control system to be operated. It also needs workers who rely on the transportation, electrical, and telecommunication networks to do their work. For the transportation network to function properly, electricity is needed to power the traffic lights. Moreover, power generation companies use the other networks one way or another to generate the energy needed to feed the electrical system. Thus, to some degree all these networks are dependent on one another to perform their tasks, and so their study should incorporate a comprehensive network-based approach to account for their interconnectedness while designing strategies to expand, maintain and protect them.

Network Science Conclusion

Network science and simulation provide a robust framework for the study of network systems in all their complexity. In this short post we have just scratched the surface to highlight how powerful this approach can be to answer questions coming from various disciplines like social sciences, marketing, healthcare, engineering, or critical infrastructure protection to name a few.

Intro to Network Science and Graph Theory

Published by Sel Gerosa on July 20, 2022July 20, 2022

Network Science Introduction

The Study of Networks

Node and Link Counts and Distributions

Path-Based Metrics

Weighed Graphs in Network Science

Stochastic Aspects of Network Science

Networks of Networks

Network Science Conclusion

Blogs

Neural Search Introduction: Putting GenAI to Work

Blogs

Why Airlines Need to Look at Holistic Machine Learning & Optimization Solutions to Improve Scheduling

Blogs

Machine Learning on Graphs: Knowledge Graphs and Embeddings

Have questions? Schedule a meeting below

Intro to Network Science and Graph Theory

Published by Sel Gerosa on July 20, 2022July 20, 2022

Network Science Introduction

The Study of Networks

Node and Link Counts and Distributions

Path-Based Metrics

Weighed Graphs in Network Science

Stochastic Aspects of Network Science

Networks of Networks

Network Science Conclusion

Related Posts

Blogs

Neural Search Introduction: Putting GenAI to Work

Blogs

Why Airlines Need to Look at Holistic Machine Learning & Optimization Solutions to Improve Scheduling

Blogs

Machine Learning on Graphs: Knowledge Graphs and Embeddings

Have questions? Schedule a meeting below