Mosaic developed a question-and-answer (QA) platform powered by advanced AI document review for regulatory compliance to assist the customer’s personnel in managing the process more efficiently.
Take Our Content to Go
Introduction
The oil and gas industry, along with related sectors that assist in the transportation of natural resources (e.g., pipeline management, and logistics), is increasingly embracing digital transformation to manage vast amounts of data and streamline operations. Managing this unstructured data can pose a significant challenge, as it often contains critical information necessary for operational efficiency and regulatory compliance.
The complexities of regulatory compliance further complicate data management. The natural resources sector faces stringent regulatory requirements that mandate comprehensive documentation and timely responses to inquiries. McKinsey reports that inefficiencies in data management can lead to significant productivity losses, with oil and gas employees spending up to 20% of their workweek searching for and gathering information. Furthermore, a report by JPT highlights that up to 80% of losses in the natural resources sector are preventable with better data management and predictive analytics.
Problem
One of North America’s largest oil pipeline operators was undergoing an application process with a national energy regulator to expand its operations. This process required them to review over 40,000 documents and respond to numerous questions about the pipeline project within a short timeframe. The traditional manual review process was labor-intensive, error-prone, and time-consuming, creating a significant bottleneck in meeting regulatory deadlines.
Solution
Mosaic Data Science developed an advanced Question and Answer (QA) platform to assist the oil pipeline company’s personnel in managing the document review process more efficiently. The solution integrated Mosaic’s award-winning Neural Search Engine solution framework, wielding the power of large language models (LLMs) to automate and streamline the search and retrieval of relevant information from vast document repositories.
Mosaic Neural Search, recognized as the top insight engine of 2024 by CIO Review, transcends the limitations of off-the-shelf Generative AI (GenAI) tools. This tailored, scalable, and multi-modal next-generation search solution unlocks insights from diverse data formats. By integrating seamlessly with existing systems, Neural Search ensures security, flexibility, and the avoidance of vendor lock-in. This approach gives businesses complete control over their data and the freedom to select the best AI tools for their unique needs.
Process
The project commenced with a kickoff meeting where Mosaic Data Science and the pipeline operator set out to define clear objectives and establish a detailed roadmap. Following the kickoff, Mosaic embarked on developing a data pipeline to extract relevant PDFs from the national regulator’s website. This involved designing a sophisticated web scraper capable of navigating the national regulator’s site recursively, downloading the necessary files, and extracting metadata such as file URLs and authors. The extracted data was then parsed, meticulously joined with the metadata, and vectorized for efficient storage in a vector database.
Fig. 1: Neural Search Solution Architecture Developed for Customer’s Q&A Platform
With the data pipeline in place, Mosaic proceeded to develop an end-to-end NLP and LLM search engine prototype. This prototype underwent rigorous testing, with models being refined and evaluated against a wide range of queries to ensure they could accurately and efficiently retrieve relevant documents. The objective was to create a robust prototype that could handle the complexities of the document review process with high precision.
Once the prototype proved effective, Mosaic deployed a LLaMA-2 chat model within the pipeline company’s Azure environment. The decision to use LLaMA-2 was driven by its high performance, open-source nature, and compatibility with Azure’s ecosystem. Mosaic configured the prompt engine to interface seamlessly with the LLM via an API endpoint, enabling real-time, natural language responses to user queries.
Fig. 2: Screenshot of Neural Search Q&A Platform Interface
Next, comprehensive documentation detailing the solution’s architecture and features was prepared to facilitate smooth knowledge transfer. Mosaic also conducted extensive training sessions for the customer’s personnel, designed to equip the staff with the necessary skills to use and manage the QA platform effectively and ensure they could fully leverage its capabilities.
Results
The implementation of the Neural Search Engine significantly enhanced the efficiency of the pipeline company’s document review process. Key highlights included:
- Speed and Efficiency: The system could search over tens of thousands of documents and retrieve relevant information within seconds, significantly reducing the time required for manual searches.
- Enhanced Accuracy: The AI-driven search minimized errors and ensured that critical details were not overlooked, leading to a more thorough and reliable review process.
- Cost Efficiency: Improved productivity and reduced operational overhead resulted in lower costs and higher efficiency in managing the document review process.
- Unlocking Hidden Value: The ability to efficiently extract and summarize critical information helped identify and address key regulatory requirements, maximizing the potential for successful project approval.
Conclusion
The collaboration between Mosaic Data Science and the pipeline operator demonstrated the power of Neural Search technology in automating and enhancing the efficiency of complex document review processes. The QA platform, using AI document review for regulatory compliance, not only saved time and resources but also improved accuracy and cost-efficiency, ensuring that the company could meet regulatory deadlines effectively. Mosaic’s tailored, scalable, and multi-modal search solution showcased how advanced AI technologies could address the limitations of traditional data management methods, providing a robust and flexible tool to meet the unique needs of the pipeline management industry.