Deploying Scalable Enterprise RAG on Kubernetes with Ansible Automation

MichalProstko · ‎07-07-2025

Authored by: Michał Prostko and Izabella Raulin

In the rapidly evolving landscape of generative AI, Retrieval-Augmented Generation (RAG) has emerged as a powerful method to bridge the gap between large language models (LLMs) and business-specific knowledge. Nowadays, companies want to RAG as it enables LLMs to "speak the language" of your business. Its benefits are clear, whether used to support internal teams or to improve customer interactions, RAG offers a secure and efficient way to deliver relevant, up-to-date answers grounded in trusted content. Its use cases are broad and growing. To address these challenges, we developed Intel® AI for Enterprise RAG – a solution that enables you to deploy RAG on Intel® Xeon® processors and Intel® Gaudi® AI accelerators.

We’re excited to release version 1.3.0, where our focus extends beyond user-facing features to significantly enhancing the deployment experience, as we understand that even the most powerful solution can be underutilized if it’s difficult to set up or manage. That’s why we’ve taken a strong admin perspective, emphasizing simple deployment, intuitive configuration, and seamless ongoing management of users, monitoring, and scaling. Ultimately, it’s about how fast your team can iterate and deliver real business value-true RAG time-to-value.

Automating RAG Deployment

Deploying complex, enterprise-grade solutions often involves orchestrating multiple components, managing configurations, ensuring consistent environments, and complying with industry security standards. When we initially developed Enterprise RAG, we only supported a handful of features, so the architecture was unsophisticated. As we adopted more features and our project matured, the architecture complexity increased significantly. Our deployment process relied heavily on custom scripts that grew increasingly complex and difficult to maintain over time.

We needed a more robust and proven solution and chose Ansible as our path forward due to its declarative approach, rich ecosystem, and ability to handle complex deployments across diverse environments.

Advantages of deployment with Ansible

Shift to Ansible significantly improved environments' maintainability, scalability, and operational clarity. Below are some of the key improvements:

Centralized config file: Scattered environment variables were replaced with a single config.yaml file that serves as the source of truth for our entire deployment. This configuration drives all dependent modules consistently and provides a clear user interface.
Modular architecture: The monolithic deployment script was separated into distinct, conditionally deployable Ansible roles. Each component became an independent module that could be turned on or off through configuration flags.
Feature toggling: allows users to turn features on or off as needed easily. If your infrastructure already includes telemetry and monitoring infrastructure, turn off the built-in monitoring and save infrastructure resources.
Multi-node deployment: Ansible's agentless architecture allows automation beyond a single node. Orchestrate deployments across multiple hosts. Install Kubernetes clusters, handle prerequisite installation, and configure nodes consistently across environments.

Other significant features

Horizontal Pod Autoscaler

To ensure our infrastructure remains responsive and efficient under varying workloads, we’ve implemented Kubernetes’ Horizontal Pod Autoscaler (HPA). This powerful feature enables automatic adjustment of pod replicas within our deployments based on real-time metrics. By integrating HPA with our internal metric servers, we’ve established a dynamic scaling mechanism that responds intelligently to spikes in demand.

Whether handling intensive data processing tasks or accommodating surges in user queries, our systems scale automatically to maintain performance and reliability. This approach enhances operational agility and optimizes resource utilization across our platform.

Storage RBAC

Data access is foundational in any retrieval-augmented generation system; ours is no exception. Our architecture relies on data accessible via the well-known S3 protocol, which is then processed and stored in a dedicated vector database for search and retrieval. However, this vector database operates independently of user permissions and lacks native awareness of access controls tied to the original data sources.

We’ve implemented Storage Role-Based Access Control (RBAC) to address this. By integrating directly with the underlying storage layer’s RBAC policies, we validate user access to specific buckets before executing retrieval queries. This ensures that only data the user is authorized to access is included in the context passed to the language model (LLM).

Experience Enterprise RAG yourself

We invite you to explore our Enterprise RAG solution firsthand. While the underlying architecture might seem complex, we’ve prioritized simplicity and flexibility in deployment to ensure a seamless experience for enterprise teams.

You can deploy Enterprise RAG across a variety of environments, including:

Intel® Tiber™ AI Cloud
IBM Cloud with Intel® Gaudi® 3 AI Accelerators
Bring Your Own Infrastructure – Whether you're running on Intel® Gaudi® AI accelerators or standard Intel® Xeon® CPU-based instances, our solution is optimized to perform efficiently. While accelerators are recommended for peak performance, they are not required.

This flexibility allows you to deploy and integrate Enterprise RAG into your existing stack with minimal friction, empowering your teams to harness the power of retrieval-augmented generation securely and at scale.

Please refer to the official documentation: Enterprise RAG – Installation Guide for full installation instructions.

Browse our AI Software Catalog to find more solutions tailored to your business needs.

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.