Boosting LLM Performance with Intel® Extension for PyTorch on Dell R760

ShawnaMR · ‎05-06-2025

In the world of large language models (LLMs), performance optimization is key to unlocking the full potential of AI applications. Intel® Extension for PyTorch (IPEX) offers a powerful solution for enhancing the performance of LLMs on Dell platforms, particularly for Elasticsearch solutions. This blog post explores the performance gains achieved by using IPEX with the Llama 3 8B model, demonstrating how easy it is to boost your AI workloads.

Benchmarking Llama 3 8B with IPEX

The Llama 3 8B model, a state-of-the-art LLM, was benchmarked using Intel® Extension for PyTorch v2.4.0+cpu. The tests were conducted on a Dell PowerEdge R760 with 5th Gen Intel® Xeon® processors, focusing on Elasticsearch solutions. The benchmarking process involved averaging results from three runs, each consisting of 10 warm-up iterations followed by 100 iterations to ensure accuracy and consistency.

Key Findings

The benchmark results, as illustrated in the accompanying chart, highlight the significant performance improvements achieved with IPEX optimizations:

Input Token Size 32: int8 IPEX achieved 3.54x and blfoat16 IPEX 1.97x speed up compared to the baseline bfloat16 performance.
Input Token Size 128: int8 IPEX achieved 3.38x and blfoat16 IPEX 2.01x speed improvement compared to baseline bfloat16 performance.
Input Token Size 1024: int8 IPEX achieved 3.30x and blfoat16 IPEX 2.49x speed up compared to the baseline bfloat16 performance.
Input Token Size 2048: The highest speed-up of 3.47x with int8 IPEX was recorded, showcasing the effectiveness of IPEX for larger input sizes.

These results demonstrate that IPEX with Intel® Extension for PyTorch v2.4.0+cpu can significantly enhance the performance of LLMs, making it an invaluable tool for AI practitioners looking to optimize their models on Dell platforms.

Hardware Configurations

System	Dell PowerEdge R760
Baseboard	Dell 0VRV9X
Chassis	Dell Inc Rack Mount Chassis
CPU Model	Intel® Xeon® Platinum 8562Y+
Microarchitecture	EMR_MCC
Sockets	2
Cores per Socket	32
Hyperthreading	Enabled
CPUs	128
Intel Turbo Boost	Enabled
Base Frequency	2.8 GHz
All-Core Maximum Frequency	3.8 GHz
Maximum Frequency	4.1 GHz
NUMA Nodes	2
Prefetchers	L2 HW: Enabled, L2 Adj.: Enabled, DCU HW: Enabled, DCU IP: Enabled, AMP: Disabled, Homeless: Enabled, LLC: Enabled
PPINs
Accelerators	DLB 2 [0], DSA 2[0], IAA 2 [0], QAT 2 [0]
Installed Memory	2048 GB (16x128 GB DDR5 5600 MT/s [5600 MT/s])
Hugepagesize	2048 kB
Transparent Hug Pages	madvise
Automatic NUMA Balancing	Enabled
NIC	4x NetXtreme BCM5720 Gigabit Ethernet PCIe, 2x Intel® Ethernet Controller E810-C for QSFP
Disk	1 x 447.1G Dell BOSS-N1, 6 x 3.5T Dell NVMe PM1743 RI E3.S 3.84 TB
BIOS	2.2.7
Microcode	0x21000240
OS	Ubuntu 24.04 LTS
Kernel	6.8.0-41-generic
TDP	300W
Power & Perf Policy	Normal (6)
Frequency Governor	schedutil
Frequency Driver	intel_cpufreq
Max C-State	9
Vulnerability	All OK

System Profile

System Profile	Current Value
Optimized Power Mode	Disabled
CPU Power Management	Maximum Performance
Memory Frequency	Maximum Performance
Turbo Boost	Enabled
Energy Efficient Turbo	Disabled
C1E	Disabled
C-States	Disabled
Memory Patrol Scrub	Standard
Memory Refresh Rate	1x
Uncore Frequency	Maximum
Dynamic Load Line Switch	Enabled
Energy Efficient Policy	Performance
Monitor/Mwait	Enabled
Workload Profile	Not Configured
CPU Interconnect Bus Link Power Management	Disabled
PCI ASPM L1 Link Power Management	Disabled
Workload Configuration	Balance

Software Configuration

Workload	Benchmark
Application	Intel® Extension for PyTorch v2.4.0+cpu
Model	meta-llama/Meta-Llama-3-8B
Middleware, Framework, Runtimes	Intel® Extension for PyTorch v2.4.0+cpu
Containers and Virtualization	Run in Docker Container
Input Tokens	32, 128, 1024, 2048
Output Tokens	32

Why Use Intel® Extension for PyTorch?

Intel® Extension for PyTorch provides several benefits that make it an attractive choice for optimizing LLMs:

Seamless Integration: IPEX integrates smoothly with existing PyTorch workflows, allowing developers to leverage its optimizations without extensive code modifications.
Enhanced Performance: By utilizing advanced optimizations, IPEX boosts the performance of LLMs, reducing inference times and improving throughput.
Scalability: IPEX is designed to scale efficiently across different hardware configurations, making it suitable for a wide range of applications and platforms.
Ease of Use: With straightforward installation and configuration, IPEX enables developers to quickly enhance their models' performance.

Implementing IPEX on Dell PowerEdge R760

To take advantage of IPEX optimizations on Dell PowerEdge R760, follow these steps:

Install Intel® Extension for PyTorch: Ensure you have the latest version of IPEX installed in your environment. You can find installation instructions in the official documentation.
Integrate IPEX into Your Workflow: Modify your PyTorch scripts to incorporate IPEX optimizations. This typically involves importing IPEX and applying its optimizations to your model and data loaders.
Benchmark and Optimize: Run benchmarks to measure the performance improvements achieved with IPEX. Use the results to fine-tune your model and optimize its performance further.

Conclusion

Intel® Extension for PyTorch offers a powerful and easy-to-use solution for enhancing the performance of LLMs on Dell platforms. By leveraging IPEX optimizations, developers can achieve significant speed-ups, making their AI applications more efficient and responsive. Whether you're working with Elasticsearch solutions or other AI workloads, IPEX provides the tools you need to unlock the full potential of your models.

For more information and to get started with IPEX, visit the Intel® Extension for PyTorch GitHub repository and explore the possibilities of optimized AI performance today!

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.