Taking GenAI Beyond the Hype With Advanced Neural Search - Q&A With Sean Osis & Juan Rebollo


If you’ve been keeping up with the latest news in tech, you’ve probably seen the term “Generative AI” tossed around with the rise of ChatGPT. But is it all just hype? That’s what we aimed to uncover during our dual Q&A with Senior Principal Data Scientist Sean Osis and Principal Data Scientist Juan Rebollo.

Mosaic has recently released a solution framework for delivering contextual results on any set of documents. Our Neural Search Engine framework is built to help companies go beyond Robotic Process Automation and return highly relevant results from documents containing images, structured table text, unstructured text, and audio. The Neural Search Engine framework allows organizations to take advantage of the latest LLM innovations securely behind their firewalls without passing any sensitive information back to the algorithm.  

During the Q&A session, we hit on hot-button topics like GenAI while trying to uncover what is hype and what is worth keeping an eye on as an organization looking to automate and innovate with Artificial Intelligence.  Read up on how Mosaic is tying this technology to our work with transformer architectures, delivering a breakthrough, custom Neural Search Engine that promises to save organizations countless hours sifting through documents.

1. What are your titles and responsibilities at Mosaic Data Science?

Juan: I am a Principal Data Scientist at Mosaic, and I have been working here for about 10 years. I have been involved in a large variety of projects, including some government work. On the commercial side, I’ve worked with clients in industries like oil and gas, aviation, and retail. My role in these projects is to work with the client and the business to execute data science projects that provide value to their organization. I’m often involved in guiding other data scientists while also doing hands-on work.  

Sean: I am a Senior Principal Data Scientist at Mosaic and have been with Mosaic for 6 years. I have several roles, including business development, leading project work across a variety of domains, and I am one of the data scientists currently developing our Advanced Neural Search Engine capability at Mosaic. I’ve worked with our Canadian customers and clients from industries like energy, utilities, and healthcare to develop advanced analytics solutions that benefit them.

2. From your perspective, why do you think GenerativeAI has been so transformative?  

Sean: Up until only a few years ago, AI model development was very dependent upon finding the right training data set for your particular purpose, and this was a huge part of how early language models were developed. Data scientists had to have a deep understanding of their use case and then find the right data to support training models to solve problems.  

With generative AI, we now have these pre-trained models that are becoming so comprehensive that they are already embedded with knowledge and understanding to tackle a wide range of use cases with little to no training. And it’s not just language models; now we have models with a vast knowledge of art, photographs, music, and even technical content like engineering plans. 

There is also a kind of magical quality in how these models can interpret these diverse requests and provide content in a natural, conversational fashion. So, the wide scope of these technologies and their ease of use are what, in my opinion, is really transformative.   

Juan: The field of generative AI has evolved quickly over the last few years, especially since the transformer architecture was released by Google about 6 years ago – leading to a significant leap in language and vision applications. We are at a point where it is hard to differentiate between human and AI-generated content, which is getting a lot of attention in the news.  

I think generative AI models are so significant due to their flexibility, and ability to solve complex tasks. For example, large language models are now able to input the definition of a problem alongside some context and provide an answer that makes sense to a human. There is no need to spend days or weeks training a model for your specific problem. In many cases, you can just describe the problem with a few examples and the large language model is able to learn from it. I find these capabilities fascinating, and they are evolving very quickly.

3. There is a lot of hype around GenAI. In your estimation, what is warranted, and what is overhyped?

Sean: For any day-to-day tasks that involve finding and integrating basic knowledge, these generative AI tools will be a powerful way to augment skills and increase speed and efficiency for finding answers. I think the hype there is quite real.

However, one blind spot with these tools is the legitimacy of the information being generated. There have been many documented cases of the tools producing random information that is neither true nor sensical. It is therefore critical to maintain some skepticism about the answers that generative AI provides and ensure that the right subject matter experts are engaging with the information before it is put into practice. 

4. What are the capabilities of Mosaic’s Neural Search Engine? 

Juan: Mosaic’s Neural Search Engine combines traditional search with generative AI and other deep learning models while leveraging AI capabilities on specific enterprise data for our customers.  

The search engine extracts information from text, tables, images, etc. to answer user questions. It supports multiple modes and can be easily tailored to different use cases, from answering questions within tables, searching text within images, or generating text using a large language model that answers a given question and provides answers found in enterprise knowledge stores. We leverage open-source tools and support many large language models, including open-source LLMs which can be deployed internally to avoid sending out sensitive data.

5. What data science techniques and technologies are applied to power the Neural Search Engine? How do they work within the tool?

Juan: We leverage state-of-the-art NLP techniques. The enterprise knowledge is vectorized and stored in a database, which allows us to find the most relevant content for a given user question by matching the question vector to the vectors in the database. This approach is more robust than traditional keyword matching.  

Once relevant content is found, we use transformer-based question-answering models for text or tables, summarization models, or large language models to find specific answers to questions. More importantly, we carefully engineer the prompts inputted into the LLM to ensure the models provide answers using the data ingested in the search engine, not general knowledge or false information.

6. What are some of the real-world applications of the Neural Search Engine? Who can benefit? 

Sean: In our modern world, we are constantly being bombarded with huge volumes of information. Many of us have mountains of documentation that come along with our electronic devices, the software we use, the machinery at our job, or even for things like agreements, insurance documents, and benefits packages. There has been a big push to make all of this electronic, which is great because it at least makes it searchable. However, traditional search still has huge limitations, and you still have to know things like keywords in order to find what you’re looking for.  

Mosaic’s Neural Search capability is leveraging the power of language models and generative AI to make sorting through this information much more conversational, where you can ask questions of your documents and get real answers that are returned to you in a way that provides the right context. Large organizations stand to benefit tremendously from this type of tool, as they often have such a wealth of information that is locked away in digital documents which are hard to search, and many times they don’t even know what they have. Neural Search will transform how people find information by creating an assistant that operates at the speed of modern computing for improved efficiency.

Juan: There are many applications for this tool. I would say any institution with large amounts of data, that finds itself often spending long hours looking for answers within documents can benefit greatly. Our tool can quickly find the top relevant documents and point the user to key sections with critical information. For example, engineers in a power plant who are looking for systems specs could quickly get to the right answers, avoiding browsing over potentially hundreds of pages and wasting countless hours in review time. 

7. How can businesses leverage the latest innovations in the Open-Source community and GenAI to benefit their internal use cases for dealing with unstructured data?

Sean: We believe there is a huge opportunity here to take advantage of open-source technologies. One of the large concerns starting to materialize with generative AI and online language models is the privacy and confidentiality of information that is being passed back and forth to the models. If an organization wants to mitigate this risk, there are many tools and technologies that can be leveraged to create a custom solution that they can own and control. Open-source frameworks like Python make it easier to stay on the cutting edge, and open-source versions of language models can be ported to on-premise environments. We believe these methods can be used to create secure, customized versions of generative AI tools. 

8. Why is Mosaic so well-positioned to help with GenAI use cases? What sets Mosaic apart?

Sean: Mosaic has been part of the evolution of language models, generative AI, and the field of data science for nearly a decade. We’ve been creating the basic building blocks behind machine learning and AI models for years, and as such, we’ve developed a keen understanding of their capabilities and limitations. In my opinion, we have the right blend of technical know-how and healthy skepticism to implement these technologies in a way that maximizes value and mitigates risk. 

9. Where do you see the future of GenAI going? How does Mosaic play into this?

Sean: The future of generative AI is going to be incredible. I see these technologies being integrated into everything. In much the same way that the “internet-of-things” brought connectivity for our devices and hardware, I think we’ll see the rise of “intelligence-of-things” that brings technologies like generative AI into our devices, our software, and our digital experiences. It’s an exciting time to be a part of Mosaic, as we have a front-row seat to the action, and I see us developing many of the specific use cases where this intelligence can be embedded.

Categories: Blogs