Document Summarization: Transforming Enterprise Content with Intel® AI for Enterprise RAG

IzabellaRaulin · ‎11-28-2025

Authored by @aalbersk

In today's information-rich business environment, organizations face an overwhelming challenge: extracting meaningful insights from large volumes of documents. Whether analyzing financial reports, research papers, or technical documentation, professionals spend countless hours reading through lengthy texts to identify key points and make informed decisions. This is where automated document summarization becomes mission-critical for enterprise productivity.

We're excited to announce that Intel® AI for Enterprise RAG 2.0.0 introduces a powerful new capability: Document Summarization (DocSum) pipeline. This feature extends our proven solution to include intelligent document processing, enabling organizations to automatically generate concise, accurate summaries from their content.

Core Functionalities

Intel® AI for Enterprise RAG Document Summarization provides comprehensive capabilities through an intuitive web interface and powerful backend processing:

Document Processing

Multi-format Support: Process PDF, DOCX, TXT, and other document formats through a unified pipeline
Web Interface: Drag-and-drop upload with real-time processing status at your configured FQDN (e.g., "erag.com")
Flexible Summarization: Three summarization strategies (map-reduce, stuff, refine) optimized for different document sizes and quality requirements

User Experience

Keycloak Authentication: Secure, role-based access control for admin and user roles
Output Management: Automatic summarization, saving, export, and organization
Admin Panel (admin users only)
Control Plane: Interactive pipeline visualization for instant service status verification
Observability: Direct access to Grafana dashboards for pipeline health monitoring and the Keycloak admin panel for user management

Figure 1. Document summarization main page

Document Summarization Pipeline Architecture

The Document Summarization pipeline in Intel® AI for Enterprise RAG follows a carefully orchestrated sequence of microservices, each optimized for specific aspects of document processing and summary generation.

Figure 2. Document Summarization Admin Panel

The pipeline consists of the following stages:

Stage 1: Text Extraction & Splitting

TextExtractor microservice handles the initial extraction of documents, supporting multiple formats (PDF, DOCX, TXT, and others), while TextSplitter microservice divides large documents into manageable chunks using configurable parameters like chunk size or chunk overlap.

Stage 2: Summary Generation

DocSum microservice serves as the orchestration layer for the summarization process, leveraging LangChain's summarization chains to generate intelligent, context-aware summaries. Built on the LangChain framework, DocSum supports three distinct summarization strategies, each optimized for different document characteristics:

Map-Reduce (default): Generates individual summaries for each chunk in parallel (map phase), then combines these summaries into a final comprehensive summary (reduce phase). Ideal for large documents that exceed the context window, providing comprehensive coverage while maintaining processing efficiency.

Stuff: Concatenates all chunks into a single prompt and generates one summary in a single LLM call. Suitable for small documents that fit within the model's context window. Simple and fast, but limited by the maximum token count for a single LLM request.

Refine: Processes chunks sequentially, iteratively refining the summary with each new chunk. Starts with a summary of the first chunk, then updates it based on the second chunk, and so on. Recommended for maintaining context and coherence across multiple related documents, though slower than map-reduce due to its sequential nature.

DocSum orchestrates the summarization by invoking the LLM Service, which in turn communicates with the VLLM model serving backend. This layered architecture provides:

LLM Service - Acts as an abstraction layer that handles prompt construction, streaming responses, and output formatting with OpenAI-compatible API. It manages the connection to the underlying model serving infrastructure.

VLLM - The model serving backend optimized for either Intel® Xeon® processors (CPU) or Intel® Gaudi® AI accelerators (HPU), ensuring efficient inference performance.

This architecture ensures that summaries of lengthy documents remain coherent and comprehensive, with the flexibility to choose the appropriate summarization strategy based on document characteristics and quality requirements.

Getting Started with Deployment

Intel® AI for Enterprise RAG Document Summarization supports deployment on both Intel® Xeon® processors (CPU) and Intel® Gaudi® AI accelerators (HPU). Both options share the same pipeline configuration and user experience, with hardware-specific optimizations handled automatically.

Following the deployment automation philosophy established in our previous release, Document Summarization leverages Ansible playbooks for streamlined installation and configuration.

Quick Start Steps

1. Clone Repository and Setup Environment

git clone https://github.com/opea-project/Enterprise-RAG.git -b release-2.0.0 
cd Enterprise-RAG/deployment 

sudo apt-get install python3-venv 
python3 -m venv erag-venv 
source erag-venv/bin/activate 
pip install --upgrade pip 
pip install -r requirements.txt 
ansible-galaxy collection install -r requirements.yaml --upgrade

2. Configure Deployment Settings

cp -r inventory/sample inventory/test-cluster

Edit inventory/test-cluster/config_docsum.yaml with your settings:

Kubernetes cluster connectivity (`kubeconfig`)
Domain configuration (`FQDN`)
Pipeline selection (CPU via docsum/reference-cpu.yaml or HPU via docsum/reference-hpu.yaml)
Model parameters
Proxy settings (if needed)

3. Deploy the Complete Stack

ansible-playbook -u $USER -K playbooks/application.yaml \ 
  --tags configure,install \ 
  -e @inventory/test-cluster/config_docsum.yaml

Enterprise Features and Performance

Document Summarization leverages Intel® AI for Enterprise RAG's production-ready capabilities for demanding workloads.

Automated Resource Scheduling

The Automated Balloons Policy provides intelligent CPU resource allocation on Intel® Xeon® platforms:

Automatic NUMA Discovery: Detects CPU layout and NUMA configuration without manual intervention
Topology-Aware Scheduling: Distributes inference pods across NUMA nodes to minimize memory access latency
CPU Pinning: Dedicates CPU cores to vLLM pods, eliminating context switching overhead
Throughput Optimization: Calculates maximum pod density while respecting NUMA boundaries

Additionally, Horizontal Pod Autoscaling scales services independently based on their utilization. To enable this feature, update the config_docsum.yaml file as follows:

balloons: 
  enabled: true 
  throughput_mode: true 
hpaEnabled: true

Intel® Hardware Acceleration

Intel® AMX: Automatic detection and utilization for VLLM on 4th Gen Xeon® and newer processors
AVX-512 Support: Optimized vector operations for the services on compatible processors

Observability

Grafana Dashboards: Pre-configured dashboards for pipeline health, throughput, and latency tracking
Centralized Logging: Aggregated logs with querying through Loki

Figure 3. Example Grafana Dashboard

Figure 4. Example Grafana logging page

These features combine to deliver a scalable, high-performance Document Summarization pipeline with the security and observability required for production deployments.

Experience Document Summarization Yourself

We invite you to explore Document Summarization in Intel® AI for Enterprise RAG 2.0. Whether you're processing legal contracts, financial reports, or technical documentation, this pipeline delivers the intelligent automation needed to transform enterprise content into actionable insights.

Please refer to the official documentation: Intel® AI for Enterprise RAG - Installation Guide for full installation instructions.

Browse our Software Catalog to find more solutions tailored to your business needs.

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.