Graph Analytics Review from Mosaic Data Science
Graph analytics is a classic network science technique that is making waves due to the advancements in Graph Database Technology (GDB) and the integration of machine learning techniques (i.e., neural networks) to solve a wide range of use cases. There is no need for a map to figure this out – Principal Data Scientist Daniel Salazar is giving us the full scoop as our guide.
1. What are your official title, role, and responsibilities at Mosaic Data Science?
My official title at Mosaic is Principal Data Scientist, and my responsibilities cover different things, from engaging with potential clients in exploratory calls to the execution of data science projects, where I participate as a contributor and technical lead. The most important part of my job is to ensure that we offer high-quality data science solutions that satisfy the needs of our clients.
2. How did you first come across graph analytics in your data science journey? What was your first project?
If you browse for “graph analytics” on Google or Bing, you’ll find a sizeable number of articles that are, at most, one year old. Before that, it was common to talk about graph theory or network analysis. The former corresponds to the study of graphs as a mathematical object, whereas the latter corresponds to the study of networks as we encounter them in our daily life (e.g., transportation networks, power grids, or social networks).
I first got in touch with networks about 20 years ago as part of a postgraduate course on Reliability Engineering. Later, as part of my doctoral dissertation, I studied networks from the point of view of optimal allocation of resources to reduce risk from intentional attacks. At that point, I got interested in what became known as Network Science and the work of Albert Barabasi and others.
3. In your own words, how would you describe the field of graph analytics?
Before answering that question, we need to define networks and graphs. A network is a collection of entities and their relationships or connections. There are many types and instances of networks around us, so in order to study them, we can use a mathematical abstraction called a graph, in which entities are now represented by nodes and connections by links. Interestingly, you can have very different networks represented by the same exact graph. What’s even more interesting is that by analyzing the graph’s structure, you can learn about all the networks it represents.
I think we can describe graph analytics as the confluence of three main components; firstly, the analysis of the structure or topology of the graph I just mentioned, secondly the use of machine learning to discover new properties and make predictions about the network beyond confinements of topological analysis and finally, the use and development of technology formulated explicitly to model, visualize, and analyze graphs.
4. Why do you think graph analytics problems are so significant to solve?
It’s hard to find an aspect of reality where networks are not involved in some way. Think about it, our brains are networks of neurons, our knowledge is a network of concepts, ideas, and memories. Our daily lives rely on different types of social interactions, supply chains, and critical infrastructures, all of which are instances of networks. Graph analytics allow us to better understand and exploit such interconnected structures more naturally and efficiently to answer a wide variety of questions.
5. What techniques and most promising use cases are associated with this technology? Which industries can benefit?
The first thing you want to do with a network is analyze its topology, learn about its size, and how nodes and links connect in the network. There are metrics such as node degrees, centrality, average path length, etc. that are related to the relative importance of nodes (think of who’s an influencer), the robustness of the network (for example, what happens with traffic if a street is blocked, or an airport shut down) and how quickly things can spread through the network (think of viral contagion).
You can also use machine learning to predict things about a network, for example, what nodes are likely to establish a link. If a group of nodes represents people and another group of products (say movies), you can learn from existing connections who likes what and apply this knowledge to recommend new products to customers.
So, as you can see, there are ample possibilities for using graph analytics in different sectors. The medical field, supply chain, utilities, security, and everything where personalization plays a role, can benefit significantly from graph analytics.
6. What’s the coolest thing you have seen with a graph analytics solution?
Perhaps drug discovery takes the crown. We can use knowledge graphs to represent information about drugs and diseases. What happens if someone is taking two drugs at the same time? What effect will this have? Will one block the other or cause undesirable effects? To answer this, you can use machine learning on top of the knowledge graphs to predict relationships between diseases, symptoms, and drugs. Moreover, with a similar approach, machine learning can be used to find new treatments for known diseases by repurposing drugs.
7. Comment on how mathematical optimization can integrate into graph analytics
Mathematical Optimization is about the theory, techniques, and tools employed to find the best possible solution to a problem expressed by means of a mathematical function and a set of constraints.
One way it integrates into graph analytics is within the larger framework of network-related decision problems. Generally speaking, one seeks to optimize network performance by taking at least one of the following actions:
- Modify the cardinality (i.e., add nodes or links, e.g., where should we place new charging stations)
- Change the topology (in other words, decide what nodes should be connected by links, e.g., how should we structure a team to maximize productivity)
- Change the properties of nodes or links (e.g., by increasing the capacity of links in transportation or telecommunication nets)
- Modify the way the network operates and reacts (i.e., devise policies for maintenance and protection)
- Finally, it is assumed that each action has a cost (time and money) and resources are limited.
8. How do you think graph analytics ties into other deep learning/machine learning applications?
The quality of a Machine Learning model largely depends on how the data is represented. One of the challenges of working with a graph was how to represent the problem so that the information encoded in the topology of the graph doesn’t get lost. This gave rise to Graph Neural Networks (or GNN for short).
9. Why is Mosaic so well-positioned to help with graph analytics use cases? What sets Mosaic apart?
I think the strength of Mosaic is having a team so well-versed in different aspects of data analytics and data science projects.
10. Where do you see the future of graph analytics going? How does Mosaic play into this?
Graph analytics is not part of the typical toolset of data science practitioners. As more and more organizations consider graph analytics, they will need help to determine if a graph-based approach is the right one, and how to move forward with graph-related projects. Mosaic is the right kind of partner to help in this journey.