Scaling GenAI: How Intel IT Is Building an Enterprise-Ready Generative AI Framework

Wendy_West · ‎12-03-2024

Author: John Hansen, Generative AI Platform Architect, Intel IT

Intel IT has developed a new platform, aimed at democratizing Intel’s use of generative AI (GenAI) by empowering employees at all levels of technical skill to use these tools to improve their day-to-day productivity and unlock new efficiencies in our business processes, as described in the IT@Intel white paper Democratizing the Use and Development of Generative AI Across Intel.

To make GenAI work at an enterprise level, we determined that we need three key ingredients in the mix: RAG (retrieval-augmented generation) capability, the ability to host or consume large language or embedding models, and the ability to orchestrate between the data in the RAG ingredient and the models provided by cloud vendors or hosted centrally. Without these ingredients—RAG, models, and orchestration—we couldn’t develop a good GenAI platform.

Many enterprises begin their GenAI journey by experimenting with using OpenAI models or other foundational models like Anthropic to develop solutions that may be specific to a certain domain of data, such as a coding environment for product quality assurance (QA) or customer data for support. These initial solutions typically consist of small amounts of customer or product data, an orchestration engine (perhaps using Semantic Kernel, LangChain, Haystack, etc.), and large language models, either hosted locally or through large cloud providers.

But GenAI can do a lot more than combining data with models to generate answers. To expand these domain-specific solutions into something that can be used at an enterprise level, we realized we needed to scale the data, model, and orchestration components to enable users to configure their own experiences and move beyond the chat experiences that many enterprises start by building.

At Intel, we are building a GenAI framework that uses (or will use in the future) models hosted on our own Intel® Xeon® processor and Intel® Gaudi® hardware solutions, as well as cloud-based models. We’re creating an abstraction layer consisting of an inferencing layer or API layer on top of those models so they can be consumed in a standardized way. Advanced teams who understand how to integrate with APIs and different conversational interfaces don't really need an abstraction layer. However, less advanced teams looking to include GenAI into their existing applications need those abstraction interfaces to easily consume and switch between models.

Large commercial models provide wonderful capabilities, but smaller models applied to targeted solutions with data can provide equally good responses if properly tuned. This gives us the ability to host our own models and protect Intel’s data by keeping it on-premises and in our own data centers.

Every enterprise has data stored in various places. Enterprises can choose to build individual solutions to extract, chunk, create embeddings, and store those embeddings in vector databases. However, we do not consider this to be the most efficient and effective way to bring enterprise data into the GenAI space. We’re working on a more efficient method called the crawlers framework. These programmatic crawlers connect to enterprise data sources, extract the data in a standardized way, and determine the entitlements required to protect that data. They also determine internal classification, tagging, and other related items required by the enterprise to secure data. They convert unstructured data into a standardized format, which can then be ingested into pipelines and vectorized according to project needs to meet accuracy requirements.

The approach is simple: data source, crawler, standardized output of data, and associated metadata. Each bit of extracted data has metadata containing entitlements, classifications, tags, and subjects. This metadata protects each chunk of data. When data is ingested, chunked, vectorized, and stored in a vector database, each record contains metadata with entitlements. Retrievers can then ensure that the entitlements match the required access groups for the user, so only data the user is entitled to see is returned.

As a large enterprise, Intel’s data comes from many sources: data-sharing systems, wiki systems, custom systems for storing documentation, etc. These systems always require certain access rights to view the data, and those access rights must persist into the vectors or the vector representation of that data. This means that when we build our AI orchestration layer and are in the question-and-answer process for a specific user, one user may get a different response than another user—even when using the same data source—due to different entitlements.

Note that this does not mean that we—or any enterprise—should store all data in a single vector store. This is more of a conceptual method, and enterprises would need to split the data as per their requirements. Some enterprises also require that data be physically separated for various reasons, and that's perfectly fine.

Now that we have our data vectorized and stored with entitlements on each individual vector, we can start building the AI workflow. As each user needs information, we can start pulling that data in. The challenge is finding all this data and bringing it into the context of an AI workflow. We believe this can be improved by centralizing the semantic descriptions of the data and using a semantic similarity search to find and bring back the right data to help answer questions or take actions.

This is something we are still conceiving and designing. By looking at the data, RAG pipelines, models, and AI workflow as a holistic system, and by utilizing the power of different tools in the generative AI space, we are building an overarching platform that supports enterprise requirements for security and scalability.

For more information, read the IT@Intel white paper Democratizing the Use and Development of Generative AI Across Intel.