In an era where businesses need precise and actionable information, GraphRAG offers a significant leap forward in information retrieval. Unlike traditional retrieval-augmented generation (RAG), which can struggle to connect related but dispersed data, GraphRAG uses knowledge graphs to deliver more contextual, relevant, and actionable responses. This technology enables organizations to unlock deeper insights from their data, improving operational efficiency and decision-making without sacrificing precision.
Take Our Content to Go
The Limitations of Conventional RAG for Corporate AI Applications
Retrieval Augmented Generation (RAG) is a framework that has become a mainstay in overcoming some of the conventional shortcomings of Large Language Models (LLMs). LLMs are not all-knowing models and will return incorrect answers when asked questions that require knowledge that it doesn’t possess – such as questions about information contained only in non-public documents – or they don’t understand specific terminology or language being used in the question. In the same way, LLMs struggle with up-to-date or current answers as the newest data isn’t simply available to the model. Unlike some AI models, retraining or fine-tuning a foundation model like an LLM is not a simple or cheap task, making it impractical for most use cases. As such, RAG has become a powerful tool in overcoming some of these limitations.
However, RAG has shortcomings in connecting the dots when answering queries that require traversing information across separate documents connected by higher-level or conceptual attributes. This results in retrieval depth and breadth limitations, often leading to unsatisfactory or incomplete answers.
To overcome some of the shortcomings of conventional RAG, researchers and engineers at the cutting edge have proposed the integration of knowledge graphs as part of the RAG workflow – a framework known as GraphRAG. In a saturated market where RAG solutions have become commonplace but fall short in many applications, innovations like Smarter AI with GraphRAG are essential for addressing complex, multi-faceted queries that traditional RAG applications struggle with.
You might be asking yourself, what is a knowledge graph?
A knowledge graph is a means of organizing information as entities and relationships:
- Entities (the “nodes” of the knowledge graph) represent real-world concepts of interest, such as people, events, objects, and situations.
- Relationships (the “edges” of the graph) represent how two entities are conceptually connected.
For example, a knowledge graph about technology companies might represent tech companies or important individuals as entities in the graph. The entities “Jensen Huang” and “Nvidia” might be connected by the relationship “is CEO of” to encode the information that Jensen Huang is the CEO of Nvidia. Similarly, the entities “Nvidia” and “Amazon Web Services” might be connected by the relationship “is a supplier to” to represent that connection between the two companies. For more information on knowledge graphs, see our whitepaper on Graph Neural Networks.
The organization of data into a knowledge graph allows a representation of structured information about an external dataset while still being able to trace back to the underlying unstructured text that the graph represents. This extra layer of structured information that can be understood and reasoned on offers several possibilities for augmenting or supplanting the more traditional means of retrieving relevant context via similarity search over vector embeddings of external unstructured text.
Unlike traditional retrieval methods that rely solely on vector embeddings, knowledge graphs create deeper, more meaningful connections between data points, allowing for more accurate and context-rich retrieval. GraphRAG also enables contextual AI in business applications.
How to Enable Smarter AI with GraphRAG?
The first step in building a GraphRAG tool is constructing a knowledge graph. Various approaches exist, each valuable depending on the application. Careful consideration should be given to the entities, relationships, and properties defined in your graph to maximize downstream utility.
The second is deciding how to leverage the knowledge graph to provide context to an RAG pipeline. It’s important to understand that a knowledge graph doesn’t need to replace conventional vector similarity retrieval but can instead serve as a supplementary tool to enrich context. However, there are multiple means of integrating a knowledge graph into the context enrichment process of an RAG workflow. Depending on the use case, the chosen approach can greatly influence the quality and correctness of the final output. This might involve graph traversal or incorporating graph data mining as part of the retrieval process. The approach to retrieving or deriving enriched context from the knowledge graph should be tailored to the specific use case to maximize performance within a GraphRAG framework.
Benefits of Smarter AI with GraphRAG
GraphRAG represents a significant evolution in AI-powered information retrieval, combining the strengths of retrieval-augmented generation (RAG) with the advanced capabilities of graph-based data structures. Leveraging knowledge graphs offers a deeper contextual understanding, improved handling of complex queries, and more cost-effective decision-making. This section explores the key benefits of utilizing smarter AI through GraphRAG, highlighting its advantages over traditional RAG models.
Deeper Contextual Understanding
GraphRAG is particularly useful in areas where standard RAG falls short due to the narrow context of the retrieved text from traditional vector search. In scenarios involving complex data relationships or when the context requires connecting information from disparate sources, GraphRAG excels by leveraging the structured connections between entities discussed in the underlying text data or in supplementary text data provided specifically to supply relevant context for understanding the information in the primary text of interest. Unlike traditional RAG, which relies entirely on semantic similarity vector searches and may overlook deeper connections or fail to return sufficient context to answer the question adequately, GraphRAG can leverage knowledge graph structures to capture and leverage relationships within the data. This approach allows it to handle more complex queries requiring more than surface-level similarity-based retrieval, enabling a deeper understanding of the relationships between pieces of information.
Handling Complex Queries
Standard RAG may struggle when the required answer is not directly available in a single part of a document or when the answer needs to be synthesized from multiple sources. GraphRAG’s ability to navigate and retrieve connected information across a graph structure provides more accurate and comprehensive answers. This is especially valuable in cases where the answer lies in the intersection of multiple data points rather than being explicitly stated in one place. For instance, answering questions like “How many times does X occur throughout the documents?” or “What are the top 5 most discussed concepts?” requires understanding and linking different data points. Standard RAG might retrieve relevant documents but struggle to integrate and aggregate them meaningfully. GraphRAG’s approach allows it to map out these relationships, providing more accurate and contextually relevant answers by connecting the necessary concepts and entities.
Improved Decision Making at Lower Costs
Another key benefit of GraphRAG is its greater interpretability than traditional RAG models. The graph-based approach allows for more transparent reasoning processes, as the connections between nodes can be visualized and understood. This is particularly valuable in scenarios where understanding how an answer was derived is as crucial as the answer itself. The ability to inspect and traverse the graph used for context provides tangible feedback, making the decision-making process more interpretable and trustworthy. GraphRAG also optimizes performance by reducing retrieval times and operational costs through efficient context management, making it a cost-effective solution for complex queries.
Greater Interpretability
GraphRAG has the potential to be more cost-effective in terms of token usage – effectively the amount of text that is passed to the more costly and computationally demanding LLM part of the pipeline – although this depends on how the prompting and context formatting is engineered. Efficiently organizing and retrieving only the most relevant information through graph traversal may reduce the need for extensive context that consumes large amounts of tokens. This can lead to cheaper operations, especially in scenarios involving large-scale or complex queries, making GraphRAG an attractive option for cost-sensitive or computationally constrained applications.
Simulating GraphRAG Performance Gains
In an early blog post discussing the GraphRAG architecture, Microsoft built an implementation over a dataset of current event news articles and research papers on the Ukraine conflict. The system linked entities such as individuals, events, and geopolitical locations, providing richer and more accurate responses to complex questions that required cross-referencing information from various sources.
Question: What has Novorossiya done?
RAG Response: The text does not provide specific information on what Novorossiya has done.
GraphRAG Response: Novorossiya, a political movement, has been involved in a series of destructive activities targeting various entities in Ukraine, including attacks on infrastructure and financial institutions. The knowledge graph links multiple reports and entities, including the Ukrainian government’s acknowledgment of Novorossiya’s activities, providing a deeper understanding of its actions.
GraphRAG connects reports from different sources, in this case, extracting meaningful information about Novorossiya’s activities that RAG alone could not provide. The graph allows for the traversal of relationships between entities (e.g., people and events) and results in more comprehensive answers.
For further details, including full responses and a more in-depth explanation of the knowledge graph used, refer to Microsoft’s blog post on GraphRAG: Unlocking LLM discovery on narrative private data – Microsoft Research).
How GraphRAG enables contextual AI using Mosaic’s Neural Search
Mosaic has incorporated GraphRAG into its state-of-the-art Neural Search and information retrieval system. Neural Search empowers organizations to extract actionable insights from complex datasets, such as HR manuals or legal documents, by navigating the structured relationships in the data.
For example, when running GraphRAG over a set of HR manuals, the system can retrieve precise policy details by linking them to related data points, ensuring a comprehensive response connecting individual policies to broader organizational guidelines.
Question: What are the top 5 most discussed policies in this dataset?
RAG Response:
- 1. Date the policy was prepared (including day, month, and year)
- 2. Date any changes were made to the policy
- 3. Topics covered by the ESA
- 4. Date any public health official issued quarantine or isolation information
- 5. Date must include the day, month, and year
GraphRAG Response: “The top 5 most discussed policies in this dataset are pregnancy leave, parental leave, sick leave, family responsibility leave, and bereavement leave.”
GraphRAG is already proving its worth in various industries. In financial services, for example, it helps auditors link various factors affecting revenue, such as operational costs, market demand, and pricing strategies. In industrial sectors, it assists in synthesizing operating and repair documents, processes, and logs by connecting findings across events and regulations. These applications demonstrate how GraphRAG enables contextual AI to transform how businesses tackle complex challenges.
The Future of Smarter AI with GraphRAG
GraphRAG is positioned to be a game-changer in AI-driven knowledge retrieval as businesses adopt AI for decision-making. Enabling contextual AI provides the deep insights necessary to stay competitive in today’s data-driven world. Companies that leverage Smarter AI with GraphRAG will benefit from more accurate and complete answers and enjoy cost efficiencies and faster response times.
If your organization faces complex decision-making challenges requiring more than surface-level data retrieval, Smarter AI with GraphRAG is the solution. This cutting-edge technology enables businesses to unlock deeper insights, handle more complex queries, and make smarter decisions, all while optimizing costs and resources.
We should talk if you have made it this far in the whitepaper.
To learn more about how Smarter AI with GraphRAG can transform your business, contact Mosaic Data Science for a customized consultation on implementing GraphRAG in your organization.