Businesses face mounting market pressure to jump on the generative AI bandwagon - and with good reason. But there are barriers to its uptake, especially for enterprise companies. As Intel® Liftoff member Prediction Guard puts it, LLM models exhibit unclear reliability and structure. They return blobs of unstructured text that engineers can’t use to build robust systems. These integrations also scare corporate counsel and security professionals for a wide range of reasons: variability in output, lack of relevant compliance, leaked IP/PII, and the possibility of “injection” vulnerabilities.
So, enterprise teams need platforms that can provide access to state-of-the-art LLMs but with an API that enables them to put controls on LLM outputs. Going further, these platforms need to be able to assess output against metrics like factuality, toxicity and consistency.
Prediction Guard Bridges the Gap
Prediction Guard’s platform enables companies to adopt the latest wave of AI models, like those used in ChatGPT, without compromising on privacy or security. It also improves the reliability of these LLMs, paving the way for more consistently structured output free of the kinds of hallucinations we’re all familiar with.
The Prediction Guard platform provides an easy-to-use API for text completion and chat that would be familiar to those that have prototyped applications on top of OpenAI for tasks like data extraction, content creation, and chat response generation. However, the Prediction Guard API also provides game changing features allowing enterprise users to:
- Get output from multiple state-of-the-art LLMs (beyond the GPT family);
- Score the factuality and toxicity of model output;
- Check the consistency of model output;
- Enforce validated types (integer, categorical, etc.) and structures (e.g., JSON); and
- Filter out PII and sensitive information
Obviously, we were really excited to see what Prediction Guard would achieve when they joined our recent hackathon as part of the Intel® Liftoff program for startups.
The team deployed their entire platform in a secure and performant environment accelerated by Intel® Data Center Max GPUs - with some extraordinary results.
Integrating Prediction Guard with Intel®: Their Hackathon Goals
Prediction Guard is currently integrating more than 15 state-of-the-art LLMs, including MPT, Falcon, Dolly, and others. These large models necessitate either hosting on GPUs or optimization for CPUs, a task that their team is dedicated to executing effectively. Moreover, in their enterprise engagements, they are responsible for overseeing client-specific fine-tuning.
The recent LLM hackathon in collaboration with Intel® Liftoff provided Prediction Guard with the opportunity to work with Intel's state-of-the-art Data Center Max GPUs. During this event, their team had a clear set of goals aimed at leveraging the potential of these powerful resources.
Predication Guard’s goals for the hackathon were as follows:
Goal #1 - Deploy Prediction Guard accelerated by Intel® CPUs and/or GPUs (without fine-tuning, using SOTA LLMs):
Minimum Requirements:
- Run Camel-5B and Dolly-3B inference servers on Intel® Xeon.
- Operate Camel-5B and Dolly-3B inference servers on Intel® Data Center Max GPUs.
- Launch Prediction Guard REST API on Intel® bare metal instanced on the Intel Developer Cloud.
- Integrate and test the Prediction Guard REST API connected to the inference servers.
- Run the Prediction Guard LLM playground UI on Intel® bare metal instances.
Additional Ideal Implementation Features:
- Provide more than one model option in the Chat UI (e.g., MPT, WizardCoder, etc.)
Goal #2 - Operate Prediction Guard on Intel® CPUs and/or GPUs with a fine-tuned model (trained using distributed training on Intel® GPUs):
Minimum Requirements:
- Fine-tune Camel-5B on Intel® Data Center Max series GPUs using Salesforce’s QAConv dataset.
- Deploy the fine-tuned model in an inference server.
- Integrate and test the Prediction Guard REST API connected to the inference server.
- Run a simple demo UI on Intel® bare metal.
- Integrate and test the simple demo UI with the Prediction Guard REST API.
For these tasks, they utilized PyTorch, Intel® Extension for PyTorch, Accelerate forked version of Transformers library (https://github.com/rahulunair/transformers_xpu/) with XPU support, as well as Optimum.
Leveraging Intel® Technology to Optimize Across Three Customer Verticals
The Prediction Guard team deployed both their Prediction Guard API and their custom model servers on an Intel® server with 4th generation Xeon CPUs and 4x Data Center Max 1550 GPUs.
The API served controlled and trustworthy inferences from a variety of pre-trained LLMs. In addition, the team fine-tuned an LLM (Camel-5B) for informational data extraction use cases on the Intel® GPUs. Both the fine-tuned model and the pre-trained models were running concurrently across multiple Intel® GPUs.
Their results illustrated the utility of such a deployment in various verticals with example input from three of their current customers:
Supply Chain
The first of these inputs came from Contango, a company helping large agriculture companies optimize their supply chains. Prediction Guard aimed to create an accurate language model chain to extract fertilizer bids from email threads. The extracted information is then used to optimize the supply chain.
Results:
- Arranging various enterprise pilots using the functionality developed at the hackathon
- The data extraction capabilities are expected to help save around $2M in operational expenses per company
eCommerce
Next up: Antique Candle Co, an ecommerce customer. The team used the Prediction Guard system to extract past promotional copy (e.g., “Limited time only”) from 5 years of marketing campaigns. This extracted information was then used in a “promotion planning” tool to simulate the sales uplift of new candidate offers.
Results:
- Prototyped a system to extract historical promotions and forecasted promotion uplift for an upcoming flash sale two weeks after the hackathon (the simulated and optimized promotions resulted in an additional $160k worth of revenue!)
- The team is currently working with this customer to switch over to the fine-tuned model (created during the Intel Liftoff LLM hackathon)
Healthcare
Finally, they focused on a use case from a hospice charity helping patients facing a life-limiting illness. This use case involved extracting patient information from transcriptions of medical interviews between a caregiver and the patient. The extracted information can be used to help capacity constrained caregivers fill out tedious paperwork.
Results:
- Major reduction in transcript parsing time from 20 min (prototyped with OpenAI) to less than 1 min (with the Prediction Guard API and the fine-tuned model)
Building Safer, More Efficient LLMs with Intel®
The results from Prediction Guard clearly showcased how Intel® GPUs have enabled them to effectively manage production LLM workloads. The system developed during the hackathon now offers a private, enterprise ready LLM API. This delivers safe and trustworthy LLM outputs, which are validated, structured, and type checked. Importantly, it's designed to scale in high-throughput enterprise environments.
Prediction Guard's collaboration with Intel® Liftoff for Startups is more than a technological alliance; it's a partnership aimed at fostering growth and enhancing trust. Here's how Daniel Whitenack, founder of Prediction Guard, describes the shared mission:
"We want to be at the forefront of compute, such that our users can unleash the full capabilities of LLMs in trustworthy enterprise applications. Intel® leads the way in terms of security and performance, and this partnership through the Intel® Liftoff program is giving us the ability to sustain long-term growth and scaling."
The Intel® Liftoff helps innovators like Prediction Guard to turn their most ambitious projects into a reality. Apply to the program today to discover how far your business could go with access to Intel® accelerated computing solutions.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.