Tackling Network Security: AI Agents at the Edge with Red Hat AI on Intel® Processors and Graphics

Mrittika_G · ‎07-15-2025

Authors: Mrittika Ganguli, PE, Architect, Intel, NEX; David Kypuros, Principal AI Architect, Red Hat

Introduction: The Strategic Advantage of AI in Network Security

Modern networks generate massive amounts of data every second, making manual monitoring and analysis virtually impossible. AI agents offer a revolutionary solution by automating complex security tasks while providing the intelligence needed to identify emerging threats before they can cause damage.

Key Network Security Use Cases

Application Identification and Classification

One of the fundamental challenges in network security is understanding what applications are running on your network. AI agents excel at identifying and categorizing applications within network traffic, providing unprecedented visibility into network resources. This enhanced visibility enables organizations to implement more effective policy enforcement and security measures, ensuring that only authorized applications can access sensitive resources. Understanding what applications are running over your network is critical for enforcing policies, detecting anomalies, and optimizing resources. AI models trained on traffic metadata and encrypted packet patterns can identify apps without relying on payload inspection.

Value: Visibility into encrypted traffic without DPI.

Use cases: Micro-segmentation, access control, SASE policy enforcement.

Advanced Anomaly and Threat Analysis

Perhaps the most critical application of AI in network security is its ability to detect unusual patterns and behaviors in network activity. Unlike traditional rule-based systems that rely on known signatures, AI agents can identify subtle anomalies that might indicate potential security threats and vulnerabilities. This proactive approach allows organizations to implement defensive measures before attacks can succeed, rather than simply reacting to incidents after they occur.

AI agents can learn what “normal” traffic looks like and flag outliers—potentially identifying zero-day attacks, lateral movement, or data exfiltration attempts.

Value: Faster detection of evolving and hidden threats.
Use cases: Threat scoring, breach prevention, vulnerability exploitation detection.

The Edge Computing Revolution in Security

Deploying AI at the edge represents a paradigm shift in network security architecture. By processing data closer to where it's generated, organizations can achieve several critical advantages.

Opportunities at the Edge - Real-Time Processing and Response

Edge deployment enables real-time data processing and analysis, dramatically reducing the time between threat detection and response. This immediate analysis capability is crucial in today's fast-paced threat landscape, where delays of even seconds can mean the difference between successful threat mitigation and a successful attack.

Opportunities

Real-time Processing: AI agents can process traffic at the point of capture, enabling instant responses.
Latency Reduction: Eliminates round-trip delay to the cloud for inference.
Privacy Protection: Sensitive traffic doesn’t leave the premises, preserving compliance.

Challenges

Resource Limitations: CPUs at the edge must balance multiple workloads.
Model Reliability: AI must be robust to noisy or low-volume data environments.
Power Efficiency: Sustained operation in power-constrained locations is a must.

The Case for CPU-Based AI Inference

While GPUs and accelerators are excellent for training, CPUs remain the most available and practical compute platform for edge inference at the edge—especially in networking and telecom environments.

Key Benefits

Cost Efficiency: Leverages existing infrastructure—no need for additional accelerators.
Deployment Flexibility: Portable across servers, gateways, and even laptops.
Energy Efficiency: Intel CPUs optimized with energy saving power optimization for continuous inference via IPEX, OpenVINO, and quantized model support.

Intel Optimizations

IPEX (Intel Extension for PyTorch): Enhances PyTorch model inference speed on Intel® Xeon® processors.
Quantization: Reduces model size and increases inference throughput.
XPU Choice: Seamless fallback between CPU and integrated/discrete GPUs like Intel® Arc™.

Processing sensitive security data locally rather than sending it to centralize cloud systems significantly enhances privacy and data security. This approach keeps sensitive information within the organization's direct control while still leveraging the power of AI for threat detection and analysis. By eliminating the need to transmit data to remote processing centers, edge AI delivers dramatically improved response times. This reduced latency is particularly critical for security applications where immediate action may be required to prevent or minimize damage.

While edge deployment offers significant benefits, it also presents unique challenges that must be carefully managed. Edge devices typically have limited computational resources compared to centralized data centers. This constraint requires careful optimization of AI models to ensure they can operate effectively within these limitations while maintaining accuracy and performance.

Ensuring AI models maintain their accuracy and reliability across diverse edge environments is crucial. This requires robust testing and validation processes to ensure consistent performance regardless of the specific deployment environment.

Organizations can leverage their existing Intel-based infrastructure for AI deployment, avoiding the significant costs associated with specialized hardware using Intel CPUs and client Intel® Arc® 770 GPUs. This approach of using Intel processors and Intel Graphics makes AI-powered security accessible to organizations of all sizes, not just those with extensive technology budgets.

Intel processors offer exceptional scalability across various devices and platforms, from laptops to enterprise servers. This flexibility allows organizations to deploy consistent AI-powered security solutions across their entire infrastructure while maintaining the ability to integrate with existing network security tools.

Practical Implementation: 5G SecOps Demo Agentic Workflow Architecture

The integration of AI agents in network security is exemplified through advanced 5G SecOps implementations that demonstrate the practical application of these technologies. Modern implementations leverage sophisticated agentic workflows built on cutting-edge technologies including MCP (Model Context Protocol), Next.js, and NestJS with TypeScript for full-stack application development. This architecture provides the foundation for seamless integration between Intel hardware and Red Hat AI inference capabilities.

Figure 1: AI Agent workflow in RedHat AI

Advanced Traffic Analysis Capabilities - Encrypted Traffic Classification

One of the most challenging aspects of network security is analyzing encrypted traffic without compromising privacy. AI agents can perform sophisticated PCAP (Packet Capture) analysis to classify encrypted traffic patterns, providing security insights while maintaining data privacy.

Figure 2: Traffic analysis pipeline

This multi-part blog series goes into detail of Encrypted Traffic Analysis Practical Deployment of LLMs for Network Traffic Classification - Part 1 - Intel Community

Threat Detection and Vulnerability Analysis

AI systems can process vast datasets of vulnerability information, extracting critical vendor information and identifying potential threats. Through fine-tuned models specifically trained for security applications, these systems can perform both training and inference operations to continuously improve threat detection capabilities.

Figure 3: Vulnerability Training and Inference flow

For this process we used an example CVE database of vulnerabilities, wherein the data in Arrow DB is used. This illustrates an AI-powered vulnerability analysis workflow that processes CVE data from the national vulnerability database using fine-tuned language models. The system analyzes unstructured vulnerability descriptions and extracts critical information—vendor details, product names, and version numbers—into structured key-value pairs using JSONL format. Leveraging the Intel IPEX-LLM framework with GPU acceleration and Hugging Face infrastructure, the model performs both training and real-time inference operations, processing vulnerability data and delivering structured output in fractions of a second. This enables security teams to rapidly assess and prioritize threats across their infrastructure based on specific vendor and product information.

Figure 4: Instruction-Input-Output Inference with Gemma

Gemma 2B is a compact 2.6 billion parameter model designed for efficient on-device deployment, delivering impressive performance in text generation, question answering, and conversational AI while operating in resource-constrained environments. Its key advantages include flexible deployment across data centers, cloud, workstations, and edge devices with minimal computational requirements, enabling local inference without cloud dependencies. The model's efficiency also reduces the carbon footprint of AI systems, making it an environmentally conscious choice for organizations implementing AI-powered security solutions.

Policy Management using RAG-Based Code and API Integration

Ultimately, the network engineer applies policy action to the network device where the vulnerability is discovered. An API and code generation chatbot utilizing a sophisticated RAG (Retrieval-Augmented Generation) agent built on the LangGraph framework is described here. This architecture enables seamless integration of Gorilla API RAG models in a plug-and-play configuration, providing flexibility and scalability for diverse security applications.

LangGraph-Based Code, APIs, and RAG Agent Framework

Implements a Retrieval-Augmented Generation (RAG) workflow for code and threat knowledge.
Uses Gorilla API RAG to access pre-indexed knowledge of APIs, CVEs, and detection patterns.
Agents are plug-and-play, enabling developers to inject domain-specific models without rearchitecting the system.

Figure 5: API generated for a Network route

Figure 6: API request example

Performance Analysis and Validation and Multi-Platform Performance

Model Selection Criteria

Inference latency
Accuracy on encrypted traffic
Support for quantization

Acceleration Techniques

INT8/FP16 Quantization
Multi-threading with Intel OpenMP
Fusion with IPEX The integration leverages Intel Extension for PyTorch (IPEX) and supports both CPU and XPU (Extended Processing Unit) configurations. This flexibility allows organizations to optimize performance based on their specific hardware capabilities and requirements.

Benchmark Platforms

Laptop: Intel® Core™ i7/i9 processor with 32GB RAM (Future target. Not in this blog!)
Server: Intel® Xeon® Scalable processor with and without Intel® Arc™ Graphics
Metrics Collected: Tokens/sec, MBps, latency per inference, power draw

Real-World Performance Metrics

Performance characterization includes detailed analysis of both CPU and GPU performance, providing organizations with the data needed to make informed decisions about their specific deployment requirements.

Figure 6: Arc performs 20x

Inference heavy workload has Accuracy (F1 Score) >= 0.965
Low-cost Intel Arc GPUs can save you cores and allow such solutions on laptops or edge servers

Figure 7: QLORA optimized inference with Gemma

Gemma 2B compact size makes it suitable for deployment on devices with limited resources like laptops and even Edge devices, opening up possibilities for on-device AI processing.

Summary and Next Steps

AI agents for network security offer a scalable, cost-efficient way to safeguard distributed infrastructures, especially at the edge. With Red Hat® AI and Intel’s CPU-optimized tool chain, organizations can deploy real-time inference engines across diverse environments.

@EdwinVerplanke @RUI3 @VishakhNair @mici

Workloads and configurations. Results may vary.

8526Y:1-node, 2x INTEL(R) XEON(R) PLATINUM 8592+, 64 cores, 350W TDP, HT On, Turbo On, Total Memory 1024GB (16x64GB DDR5 5600 MT/s [5600 MT/s]), BIOS 2.3, microcode 0x21000240, 2x Ethernet Controller 10-Gigabit X540-AT2, 1x 1.7T SAMSUNG MZ1L21T9HCLS-00A07, Ubuntu 24.04.1 LTS, 6.8.0-47-generic. Test by Intel as of May 2025.

Gemma2B, GPT2-124M, Qwuen2.5-0.5B & Llama3.2-1B, ModernBERT149M: Intel Model Zoo Optimized Benchmark, Docker 24.0.7; Pytorch/IPEX 2.6.0.dev20241016+cpu, Python 3.10.15. Llama-3.1-8B: ipex-llm containter v2.50+cpu Nov 2024, Pytorch/IPEX 2.5.0, Python 3.10.15. 1 instance per NUMA node; 2nd token P90 latency < 100ms, Chatbot: input token 128, output token 128. Summarization: input token 1024, output token 128.BSX, INT4, FP16

8480+:1-node, 2x INTEL(R) XEON(R) PLATINUM 8592+, 64 cores, 350W TDP, HT On, Turbo On, Total Memory 1024GB (16x64GB DDR5 5600 MT/s [5600 MT/s]), BIOS 2.3, microcode 0x21000240, 2x Ethernet Controller 10-Gigabit X540-AT2, 1x 1.7T SAMSUNG MZ1L21T9HCLS-00A07, Ubuntu 24.04.1 LTS, 6.8.0-47-generic. Test by Intel as of October 2024.

Gemma2B, GPT2-124M, Qwuen2.5-0.5B & Llama3.2-1B, ModernBERT149M: Intel Model Zoo Optimized Benchmark, Docker 24.0.7; Pytorch/IPEX 2.6.0.dev20241016+cpu, Python 3.10.15. Llama-3.1-8B: ipex-llm containter v2.50+cpu Nov 2024, Pytorch/IPEX 2.5.0, Python 3.10.15. 1 instance per NUMA node; 2nd token P90 latency < 100ms, Chatbot: input token 128, output token 128. Summarization: input token 1024, output token 128.BSX, INT4, FP16

Tackling Network Security: AI Agents at the Edge with Red Hat AI on Intel® Processors and Graphics

Authors: Mrittika Ganguli, PE, Architect, Intel, NEX; David Kypuros, Principal AI Architect, Red Hat

​Workloads and configurations. Results may vary.

Workloads and configurations. Results may vary.