Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
855 Discussions

Next-Gen AI Inference: Intel® Xeon® Processors Power Vision, NLP, and Recommender Workloads

NithyaRao
Employee
2 1 9,018

Author: Nithya Rao, System and Software Optimization Engineer, Intel

Artificial intelligence has evolved from experimental technology to an essential business capability. Whether it's analyzing visual data on the edge, understanding human language in real-time, or delivering hyper-personalized recommendations at scale, today’s systems demand more: more speed, more efficiency, and greater adaptability.

Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX), combined with a highly optimized AI software stack, can deliver a CPU-first platform built for modern AI workloads without added complexity or overhead. This integrated accelerator enables higher efficiency and lower latency in CPU-based AI inference and training tasks. Intel AMX’s matrix multiplication capabilities are integrated directly within the CPU core, substantially accelerating throughput for key deep learning workloads across multiple applications. In computer vision, it accelerates object detection, image classification, and image segmentation, which are critical in domains such as medical imaging, industrial automation, and autonomous navigation. Intel AMX can also expedite natural language processing (NLP) applications such as text classification and embedding generation, as well as personalizing user experiences in recommender systems for e-commerce, social media, and personalized banking.

With AI ecosystem tools, developers can optimize and deploy deep learning models more efficiently across Xeon-based platforms. These libraries are tightly integrated with popular AI frameworks, including TensorFlow and PyTorch, enabling automatic quantization, operator fusion, and multi-threaded execution — all optimized for Intel architecture.

  • Optimized Frameworks: Intel upstreaming optimizations into popular deep learning frameworks like PyTorch and TensorFlow, which means default distributions automatically leverage the built-in AI accelerators of Xeon CPUs.
  • Intel® oneAPI: This comprehensive suite of development tools and libraries allows developers to maximize application performance across different Intel architectures, including specific libraries for AI and machine learning.
  • Intel® Extension for PyTorch/Scikit-learn/XGBoost: These extensions offer advanced, pre-release optimizations for popular AI libraries before they are integrated into the main open-source distributions.
  • OpenVINO™ Toolkit: This toolkit is used to help optimize and deploy AI inference models from various frameworks (TensorFlow, PyTorch, etc.) on Intel hardware, including Xeon processors, for edge and data center use cases.

Real-World Impact: Comparative Analysis of Intel Xeon Processors and AMD EPYC

With AI technology reshaping industries, it’s essential to understand how Intel’s advanced hardware and software work together to enhance power and resource utilization efficiencies, enabling better performance in computer vision, NLP, and recommender systems.

Computer vision image generation: Traditional image models are heavy and slow, often pushing the limits of standard computers. Latent Consistency Models (LCMs) change that: they operate in a lightweight latent space and require only a few inference steps, producing high-quality images instantly with lower memory overhead.

Intel Xeon 6980P processors run LCMs 3.8X better than AMD EPYC 9755, enabling near-instant visual creation. With superior memory handling, Intel processors enable the smooth operation of large AI models, allowing teams to quickly turn ideas into visuals for faster design and planning.

Picture1.png

 Figure 1: Intel Xeon 6980P delivers up to 3.8x higher LCM performance versus AMD EPYC 9755.

 

NLP text classification and sentiment analysis: Deep learning models, especially large transformer models like BERT, rely heavily on a mathematical operation called matrix multiplication. This operation involves repeatedly multiplying large tables of numbers together. This can cause memory bottlenecks and limit batch sizes used during training and inference, especially on resource-constrained hardware.

Intel AMX helps optimize matrix multiplication by utilizing dedicated hardware engines, called tiles, to process data within the CPU, rather than offloading it to a discrete accelerator, which can provide a significant performance boost. Additionally, by supporting low-precision BF16 (for both training and inference) and INT8 (for inference), Intel AMX can store more data in each core and compute larger matrices in a single operation.

By combining Intel AMX tile hardware accelerators, optimized data formats, and highly efficient software libraries, the Intel Xeon 6980P outperforms AMD EPYC 9755 by 3X on BERT Large inference, enabling faster content moderation and real-time customer feedback analysis. (See Figure 2 for detailed performance metrics.)

Picture2.png

 Figure 2: Intel Xeon 6980P delivers up to 3x higher BERT Large performance versus AMD EPYC 9755

Recommender systems personalized ranking: Recommender systems are both memory- and compute-intensive due to the use of massive embedding tables. These models often deal with categorical features such as user IDs, item IDs, or product categories, which can have an extremely large number of unique values. Each value is represented by a dense vector, or embedding, in a large table. Recommendation systems with millions of items require substantial memory to store these embedding tables. Additionally, deep neural networks require substantial computational resources to process and generate recommendations.

Intel Xeon processors leverage their large main memory capacity to store entire embedding tables. Combined with advanced optimization and parallelization techniques for deep learning operators and efficient memory management, this approach enables recommender systems to access frequently used embeddings quickly and reliably.

Intel Xeon 6980P processors can accelerate matrix factorization and deep learning models such as the open-source Deep Learning Recommendation Model (DLRMv2), improving recommendation relevance by up to 2.5x for e-commerce and streaming platforms compared to AMD EPYC 9755 processors (see Figure 3).

Picture3.png

 Figure 3: Intel Xeon 6980P delivers up to 2.5x higher DLRM performance versus AMD EPYC 9755

Reimagining Vision, Language, and Recommendations with AI

Intel doesn’t just deliver hardware — it delivers a full-stack ecosystem designed to accelerate, optimize, and simplify AI development at every stage. Whether you're fine-tuning performance on the edge or deploying inference in the cloud, the Intel tools and community are already in place to help you build faster and deploy smarter.

Learn how Intel Xeon 6 processors and our AI inference solutions can help your system see, understand, and personalize in real-time.

Product and Performance Information

Hardware Configuration

Intel Xeon 6980P: 1-node, 2x Intel(R) Xeon(R) 6980P, 128 cores, HT On, Turbo On, Total Memory 1536GB (24x64GB DDR5 8800 MT/s [8800 MT/s]), BIOS 1.1, microcode 0x1000314, 2x Ethernet Controller X550, 1x 3.5T SAMSUNG MZWLJ3T8HBLS-00007, Ubuntu 24.04.1 LTS, 6.8.0-47-generic.

AMD EPYC 9755: 1-node, 2x AMD EPYC 9755 128-Core Processor, SMT On, Boost On, Total Memory 1536GB (24x64GB DDR5 6400 MT/s [6000 MT/s]), BIOS 1.1, microcode 0xb002116, 2x Ethernet Controller X710 for 10GBASE-T, 1x Ethernet Controller E810-C for QSFP, 1x 3.5T SAMSUNG MZWLJ3T8HBLS-00007, 4x 3.5T KIOXIA KCD8XPUG3T84, Ubuntu 24.04.1 LTS, 6.8.0-47-generic.

AMD EPYC 9965: 1-node, 2x AMD EPYC 9965 192-Core Processor, SMT On, Boost On, Total Memory 1536GB (24x64GB DDR5 6400 MT/s [6000 MT/s]), BIOS 1.4, microcode 0xb101047, 2x MT2910 Family, 2x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x 1.7T Micron_7450_MTFDKBG1T9TFR, 4x 3.5T KIOXIA KCD8XPUG3T84, Ubuntu 24.04 LTS, 6.8.0-47-generic.

Test by Intel as of July 2025, using physical cores only. Your results may vary. Intel technologies may require enabled hardware, software, or service activation.

Software Configuration

LCM: LCM inference, INT8, multi-instance, BS1, AMD ZenDNN 5.1 Python 3.10.17: PyTorch 2.8.0a0+git5616fa4. Intel: 2025_ww02 DL Boost container, Python 3.10.14, Pytorch 2.6.0.dev20241124+cpu, IPEX 2.6.0+gitc5a2330
Bert-Large: Bert-Large inference, BSX INT8, multi-instance, batched, AMD ZenDNN 5.1 Python 3.10.17: PyTorch 2.8.0a0+git5616fa4. Intel: 2024_ww42 DL Boost container, Python 3.10.14, Pytorch 2.5.0.dev20240903+cpu, IPEX 2.5.0+gitf5417a3.
DLRM: DLRM v2 inference, BSX INT8, multi-instance, AMD ZenDNN 5.1 Python 3.10.17: PyTorch 2.8.0a0+git5616fa4. Intel: 2024_ww42 DL Boost container, Python 3.10.14, Pytorch 2.5.0.dev20240903+cpu, IPEX 2.5.0+gitf5417a3.

 

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

 

 

 

 

1 Comment
ShahBaig
Beginner

Thank you, very informative.