Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
792 Discussions

Tackling Network Security: AI Agents at the Edge with Red Hat AI on Intel® Processors and Graphics

Mrittika_G
Employee
0 0 9,692
Authors: Mrittika Ganguli, PE, Architect, Intel, NEX; David Kypuros, Principal AI Architect, Red Hat

Introduction: The Strategic Advantage of AI in Network Security

Modern networks generate massive amounts of data every second, making manual monitoring and analysis virtually impossible. AI agents offer a revolutionary solution by automating complex security tasks while providing the intelligence needed to identify emerging threats before they can cause damage.

Key Network Security Use Cases

  1. Application Identification and Classification

One of the fundamental challenges in network security is understanding what applications are running on your network. AI agents excel at identifying and categorizing applications within network traffic, providing unprecedented visibility into network resources. This enhanced visibility enables organizations to implement more effective policy enforcement and security measures, ensuring that only authorized applications can access sensitive resources. Understanding what applications are running over your network is critical for enforcing policies, detecting anomalies, and optimizing resources. AI models trained on traffic metadata and encrypted packet patterns can identify apps without relying on payload inspection.

Value: Visibility into encrypted traffic without DPI.

Use cases: Micro-segmentation, access control, SASE policy enforcement.

  1. Advanced Anomaly and Threat Analysis

Perhaps the most critical application of AI in network security is its ability to detect unusual patterns and behaviors in network activity. Unlike traditional rule-based systems that rely on known signatures, AI agents can identify subtle anomalies that might indicate potential security threats and vulnerabilities. This proactive approach allows organizations to implement defensive measures before attacks can succeed, rather than simply reacting to incidents after they occur.

AI agents can learn what “normal” traffic looks like and flag outliers—potentially identifying zero-day attacks, lateral movement, or data exfiltration attempts.

  • Value: Faster detection of evolving and hidden threats.
  • Use cases: Threat scoring, breach prevention, vulnerability exploitation detection.

The Edge Computing Revolution in Security

Deploying AI at the edge represents a paradigm shift in network security architecture. By processing data closer to where it's generated, organizations can achieve several critical advantages.

Opportunities at the Edge - Real-Time Processing and Response

Edge deployment enables real-time data processing and analysis, dramatically reducing the time between threat detection and response. This immediate analysis capability is crucial in today's fast-paced threat landscape, where delays of even seconds can mean the difference between successful threat mitigation and a successful attack.

Opportunities

  • Real-time Processing: AI agents can process traffic at the point of capture, enabling instant responses.
  • Latency Reduction: Eliminates round-trip delay to the cloud for inference.
  • Privacy Protection: Sensitive traffic doesn’t leave the premises, preserving compliance.

Challenges

  • Resource Limitations: CPUs at the edge must balance multiple workloads.
  • Model Reliability: AI must be robust to noisy or low-volume data environments.
  • Power Efficiency: Sustained operation in power-constrained locations is a must.

The Case for CPU-Based AI Inference

While GPUs and accelerators are excellent for training, CPUs remain the most available and practical compute platform for edge inference at the edge—especially in networking and telecom environments.

Key Benefits

  • Cost Efficiency: Leverages existing infrastructure—no need for additional accelerators.
  • Deployment Flexibility: Portable across servers, gateways, and even laptops.
  • Energy Efficiency: Intel CPUs optimized with energy saving power optimization for continuous inference via IPEX, OpenVINO, and quantized model support.

Intel Optimizations

  • IPEX (Intel Extension for PyTorch): Enhances PyTorch model inference speed on Intel® Xeon® processors.
  • Quantization: Reduces model size and increases inference throughput.
  • XPU Choice: Seamless fallback between CPU and integrated/discrete GPUs like Intel® Arc™. 

Processing sensitive security data locally rather than sending it to centralize cloud systems significantly enhances privacy and data security. This approach keeps sensitive information within the organization's direct control while still leveraging the power of AI for threat detection and analysis. By eliminating the need to transmit data to remote processing centers, edge AI delivers dramatically improved response times. This reduced latency is particularly critical for security applications where immediate action may be required to prevent or minimize damage.

While edge deployment offers significant benefits, it also presents unique challenges that must be carefully managed. Edge devices typically have limited computational resources compared to centralized data centers. This constraint requires careful optimization of AI models to ensure they can operate effectively within these limitations while maintaining accuracy and performance.

Ensuring AI models maintain their accuracy and reliability across diverse edge environments is crucial. This requires robust testing and validation processes to ensure consistent performance regardless of the specific deployment environment.

Organizations can leverage their existing Intel-based infrastructure for AI deployment, avoiding the significant costs associated with specialized hardware using Intel CPUs and client Intel® Arc® 770 GPUs. This approach of using Intel processors and Intel Graphics makes AI-powered security accessible to organizations of all sizes, not just those with extensive technology budgets.

Intel processors offer exceptional scalability across various devices and platforms, from laptops to enterprise servers. This flexibility allows organizations to deploy consistent AI-powered security solutions across their entire infrastructure while maintaining the ability to integrate with existing network security tools.

Practical Implementation: 5G SecOps Demo Agentic Workflow Architecture

The integration of AI agents in network security is exemplified through advanced 5G SecOps implementations that demonstrate the practical application of these technologies. Modern implementations leverage sophisticated agentic workflows built on cutting-edge technologies including MCP (Model Context Protocol), Next.js, and NestJS with TypeScript for full-stack application development. This architecture provides the foundation for seamless integration between Intel hardware and Red Hat AI inference capabilities.

                      Mrittika_G_0-1752614695304.png

                                                Figure 1: AI Agent workflow in RedHat AI

 

Advanced Traffic Analysis Capabilities - Encrypted Traffic Classification

One of the most challenging aspects of network security is analyzing encrypted traffic without compromising privacy. AI agents can perform sophisticated PCAP (Packet Capture) analysis to classify encrypted traffic patterns, providing security insights while maintaining data privacy.

        Mrittika_G_1-1752614695309.png

                                                    Figure 2: Traffic analysis pipeline

This multi-part blog series goes into detail of Encrypted Traffic Analysis Practical Deployment of LLMs for Network Traffic Classification - Part 1 - Intel Community

 

Threat Detection and Vulnerability Analysis

AI systems can process vast datasets of vulnerability information, extracting critical vendor information and identifying potential threats. Through fine-tuned models specifically trained for security applications, these systems can perform both training and inference operations to continuously improve threat detection capabilities.

          Mrittika_G_2-1752614695311.png

                                    Figure 3: Vulnerability Training and Inference flow

 

For this process we used an example CVE database of vulnerabilities, wherein the data in Arrow DB is used. This illustrates an AI-powered vulnerability analysis workflow that processes CVE data from the national vulnerability database using fine-tuned language models. The system analyzes unstructured vulnerability descriptions and extracts critical information—vendor details, product names, and version numbers—into structured key-value pairs using JSONL format. Leveraging the Intel IPEX-LLM framework with GPU acceleration and Hugging Face infrastructure, the model performs both training and real-time inference operations, processing vulnerability data and delivering structured output in fractions of a second. This enables security teams to rapidly assess and prioritize threats across their infrastructure based on specific vendor and product information.

              Mrittika_G_3-1752614695317.png

                        Figure 4: Instruction-Input-Output Inference with Gemma

Gemma 2B is a compact 2.6 billion parameter model designed for efficient on-device deployment, delivering impressive performance in text generation, question answering, and conversational AI while operating in resource-constrained environments. Its key advantages include flexible deployment across data centers, cloud, workstations, and edge devices with minimal computational requirements, enabling local inference without cloud dependencies. The model's efficiency also reduces the carbon footprint of AI systems, making it an environmentally conscious choice for organizations implementing AI-powered security solutions.

  

Policy Management using RAG-Based Code and API Integration

Ultimately, the network engineer applies policy action to the network device where the vulnerability is discovered. An API and code generation chatbot utilizing a sophisticated RAG (Retrieval-Augmented Generation) agent built on the LangGraph framework is described here. This architecture enables seamless integration of Gorilla API RAG models in a plug-and-play configuration, providing flexibility and scalability for diverse security applications.

LangGraph-Based Code, APIs, and RAG Agent Framework

  • Implements a Retrieval-Augmented Generation (RAG) workflow for code and threat knowledge.
  • Uses Gorilla API RAG to access pre-indexed knowledge of APIs, CVEs, and detection patterns.
  • Agents are plug-and-play, enabling developers to inject domain-specific models without rearchitecting the system.

              Mrittika_G_4-1752614695335.png

                                             Figure 5: API generated for a Network route

 

                   Mrittika_G_5-1752614695352.png

                                                              Figure 6: API request example

 

Performance Analysis and Validation and Multi-Platform Performance

Model Selection Criteria

  • Inference latency
  • Accuracy on encrypted traffic
  • Support for quantization

Acceleration Techniques

  • INT8/FP16 Quantization
  • Multi-threading with Intel OpenMP
  • Fusion with IPEX The integration leverages Intel Extension for PyTorch (IPEX) and supports both CPU and XPU (Extended Processing Unit) configurations. This flexibility allows organizations to optimize performance based on their specific hardware capabilities and requirements.

Benchmark Platforms

  • Laptop: Intel® Core™ i7/i9 processor with 32GB RAM (Future target. Not in this blog!)
  • Server: Intel® Xeon® Scalable processor with and without Intel® Arc™ Graphics
  • Metrics Collected: Tokens/sec, MBps, latency per inference, power draw

  

Real-World Performance Metrics

Performance characterization includes detailed analysis of both CPU and GPU performance, providing organizations with the data needed to make informed decisions about their specific deployment requirements.

Mrittika_G_6-1752614695353.png

                                                  Figure 6: Arc performs 20x

 

  • Inference heavy workload has Accuracy (F1 Score) >= 0.965
  • Low-cost Intel Arc GPUs can save you cores and allow such solutions on laptops or edge servers

                       

                      Mrittika_G_7-1752614695354.png

                                  Figure 7: QLORA optimized inference with Gemma

 

Gemma 2B compact size makes it suitable for deployment on devices with limited resources like laptops and even Edge devices, opening up possibilities for on-device AI processing. 

 

Summary and Next Steps

AI agents for network security offer a scalable, cost-efficient way to safeguard distributed infrastructures, especially at the edge. With Red Hat® AI and Intel’s CPU-optimized tool chain, organizations can deploy real-time inference engines across diverse environments.

 

@EdwinVerplanke @RUI3 @VishakhNair @mici 

 

 

 

Workloads and configurations. Results may vary.
8526Y:1-node, 2x INTEL(R) XEON(R) PLATINUM 8592+, 64 cores, 350W TDP, HT On, Turbo On, Total Memory 1024GB (16x64GB DDR5 5600 MT/s [5600 MT/s]), BIOS 2.3, microcode 0x21000240, 2x Ethernet Controller 10-Gigabit X540-AT2, 1x 1.7T SAMSUNG MZ1L21T9HCLS-00A07, Ubuntu 24.04.1 LTS, 6.8.0-47-generic. Test by Intel as of May 2025.
Gemma2B, GPT2-124M, Qwuen2.5-0.5B & Llama3.2-1B, ModernBERT149M:  Intel Model Zoo Optimized Benchmark, Docker 24.0.7; Pytorch/IPEX 2.6.0.dev20241016+cpu, Python 3.10.15.  Llama-3.1-8B:  ipex-llm containter v2.50+cpu Nov 2024, Pytorch/IPEX 2.5.0, Python 3.10.15. 1 instance per NUMA node; 2nd token P90 latency < 100ms, Chatbot: input token 128, output token 128. Summarization: input token 1024, output token 128.BSX, INT4, FP16
8480+:1-node, 2x INTEL(R) XEON(R) PLATINUM 8592+, 64 cores, 350W TDP, HT On, Turbo On, Total Memory 1024GB (16x64GB DDR5 5600 MT/s [5600 MT/s]), BIOS 2.3, microcode 0x21000240, 2x Ethernet Controller 10-Gigabit X540-AT2, 1x 1.7T SAMSUNG MZ1L21T9HCLS-00A07, Ubuntu 24.04.1 LTS, 6.8.0-47-generic. Test by Intel as of October 2024.
Gemma2B, GPT2-124M, Qwuen2.5-0.5B & Llama3.2-1B, ModernBERT149M:  Intel Model Zoo Optimized Benchmark, Docker 24.0.7; Pytorch/IPEX 2.6.0.dev20241016+cpu, Python 3.10.15.  Llama-3.1-8B:  ipex-llm containter v2.50+cpu Nov 2024, Pytorch/IPEX 2.5.0, Python 3.10.15. 1 instance per NUMA node; 2nd token P90 latency < 100ms, Chatbot: input token 128, output token 128. Summarization: input token 1024, output token 128.BSX, INT4, FP16