RAG-based solution for trustworthy GenAI agents and assistants in collaboration with Intel
Hallucinations in AI models, particularly large language models (LLMs), refer to instances where the model generates information that seems coherent but is actually false or misleading given the input scenario. Some of the results indicating hallucinations include incorrect facts, nonsensical statements, or fabricated details. Tackling hallucinations is critical for trustworthy AI solutions, especially in large-scale enterprise models. In a recent webinar with Intel, experts from Vectara discussed how their agentic Retrieval Augmented Generation (RAG) based platform and hallucination detection model help effectively tackle hallucinations in the enterprise LLMs at scale.
Webinar Topics:
- Concept of hallucinations in LLMs, reasons behind the phenomenon, and ways to mitigate it,
- How Vectara’s RAG-as-a-service platform and the Hughes Hallucination Evaluation Model (HHEM) help address hallucinations. and
- How to check for LLM hallucinations in real-world use cases.
This blog will give you highlights of the webinar and an overview of Vectara’s efforts to enhance enterprise LLMs.
ð Watch the complete webinar recording: Tackle LLM Hallucinations at Scale in the Enterprise |
Vectara's Vision: Reliable, Secure, and Explainable AI
Vectara provides an AI framework platform that enables the development of reliable and responsible AI agents and AI assistants. It aims at ensuring:
- Accurate and high-precision results,
- Security of the obtained results, and
- Explainability, i.e., the ability to show how the AI model arrived at the generated conclusions.
LLMs are susceptible to prompt attacks, i.e., they can be tricked almost like humans. By feeding an LLM the PIN code of one’s bank account, for instance, you can trick the LLM into revealing somebody’s salary or personal information that is not allowed to be disclosed.
In addition to posing such security threats, LLMs also tend to hallucinate, i.e., generate irrelevant or incorrect responses based on whatever the model learned from the training data. Here’s a simple example of how an LLM may hallucinate – suppose you ask an LLM, ‘What is the interest rate for a 20-year fixed mortgage?’. Based on the facts the model has been trained on, it may respond to your prompt with an x% interest rate. However, the practical answer to the prompt should state that interest rates for mortgages vary based on market conditions, credit scores, and other factors. Such misleading responses that deviate from real-world scenarios are undesirable in large-scale AI solutions in the industry.
Read further to know how Vectara attempts to tackle these challenges for enhanced LLMs in the enterprise at scale.
Mitigate LLM Hallucinations with Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) is a technique that enables LLMs to generate accurate responses relevant to the input scenario by retrieving information from external sources such as a database before generating an output. Being aware of the up-to-date information, the model thus generates more sensible output instead of relying only on the initial training data.
Vectara provides a RAG-as-a-service platform that helps reduce or eliminate hallucinations in small-sized as well as large-scale enterprise LLMs. It allows you to feed LLM with your own data and helps reduce LLM hallucinations. The platform has strict role-based and entity-based control mechanisms that prevent data leakage and hence ensures data security. Moreover, it also provides citations or references to where a response was taken from, adding a layer of explainability to the LLM.
As shown in Fig. 1 below, your input data is first sent to a data store (text or vector database). During the information retrieval step, the LLM responds to the user query by grounding the response on facts in the data store. Instead of making assumptions based on fixed training data, the LLM responds based on real-time information to ensure accurate results.
Fig.1: How Vectara’s RAG-as-a-service works
Watch the webinar from [00:06:45] to dive deeper into Vectara’s RAG-based platform for handling LLM hallucinations.
Detection of LLM Hallucinations Made Easy
From [00:13:35] in the webinar, the speaker demonstrates how difficult it is to detect LLM hallucinations and how hallucinations can be categorized (‘unwanted’/ ‘intrinsic,’ ‘benign,’ or ‘questionable’) based on the ambiguity of the generated response.
Vectara has developed its own Hughes Hallucination Evaluation Model (HHEM) available on Hugging Face* for detecting LLM hallucinations. The HHEM model series particularly helps detect hallucinations in RAG applications that summarize facts. The model checks the consistency of the output summary with the input facts. On the HHEM leaderboard, several LLMs in the market are evaluated on a given dataset to find their hallucination rates. Intel’s neural-chat-7b model shows significantly low hallucination rates and hence ranks remarkably high on the leaderboard.
Leveraging Agentic RAG for Enterprise AI
The major GenAI use cases of Vectara’s platform include AI assistants (conversational AI and question-answering) and AI agents. AI agents can be categorized as follows:
- Agentic RAG that can perform complex tasks such as breaking down the input query into multiple questions, answering each of them, and merging all the responses.
- Action engines that can directly perform an action for you, such as publishing pages on a website, sending emails on your behalf, etc.
From [00:36:30] in the webinar, the presenter explains how agentic RAG techniques work on Vectara’s platform. You can easily implement agentic RAG using the open-source vectara-agentic Python* package. Watch the webinar from [00:38:05], where the speaker demonstrates building a legal assistant using agentic RAG. Try your hands on and dive deeper into the implementation details of the legal assistant application on Hugging Face*.
What’s Next?
Check out the full webinar recording and understand how Vectara’s solutions help tackle LLM hallucinations in basic and complex commercial LLMs. Jump-start AI development with Intel’s extensive AI software portfolio – a vast collection of optimized tools, libraries, and frameworks powered by the oneAPI programming model for accelerated and enhanced AI development.
Apart from installing developer resources from our AI Tools Selector, you can also experiment with our AI software optimizations on the latest accelerated hardware such as CPUs, GPUs, AI PC NPUs, and the Intel® Gaudi® AI accelerator available on the Intel® Tiber™ AI Cloud platform.
Additional Resources
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.