Cloud
Examine critical components of Cloud computing with Intel® software experts
139 Discussions

Boosting LLM Performance with Intel® Extension for PyTorch on Dell R760

ShawnaMR
Employee
0 0 542

In the world of large language models (LLMs), performance optimization is key to unlocking the full potential of AI applications. Intel® Extension for PyTorch (IPEX) offers a powerful solution for enhancing the performance of LLMs on Dell platforms, particularly for Elasticsearch solutions. This blog post explores the performance gains achieved by using IPEX with the Llama 3 8B model, demonstrating how easy it is to boost your AI workloads.

Benchmarking Llama 3 8B with IPEX

The Llama 3 8B model, a state-of-the-art LLM, was benchmarked using Intel® Extension for PyTorch v2.4.0+cpu. The tests were conducted on a Dell PowerEdge R760 with 5th Gen Intel® Xeon® processors, focusing on Elasticsearch solutions. The benchmarking process involved averaging results from three runs, each consisting of 10 warm-up iterations followed by 100 iterations to ensure accuracy and consistency.

Key Findings

The benchmark results, as illustrated in the accompanying chart, highlight the significant performance improvements achieved with IPEX optimizations:

  • Input Token Size 32: int8 IPEX achieved 3.54x and blfoat16 IPEX 1.97x speed up compared to the baseline bfloat16 performance.
  • Input Token Size 128: int8 IPEX achieved 3.38x and blfoat16 IPEX 2.01x speed improvement compared to baseline bfloat16 performance.
  • Input Token Size 1024: int8 IPEX achieved 3.30x and blfoat16 IPEX 2.49x speed up compared to the baseline bfloat16 performance.
  • Input Token Size 2048: The highest speed-up of 3.47x with int8 IPEX was recorded, showcasing the effectiveness of IPEX for larger input sizes.

dellblogimage1-05062025.png

 

These results demonstrate that IPEX with Intel® Extension for PyTorch v2.4.0+cpu can significantly enhance the performance of LLMs, making it an invaluable tool for AI practitioners looking to optimize their models on Dell platforms.

Hardware Configurations

SystemDell PowerEdge R760
BaseboardDell 0VRV9X
ChassisDell Inc Rack Mount Chassis
CPU ModelIntel® Xeon® Platinum 8562Y+
MicroarchitectureEMR_MCC
Sockets2
Cores per Socket32
HyperthreadingEnabled
CPUs128
Intel Turbo BoostEnabled
Base Frequency2.8 GHz
All-Core Maximum Frequency3.8 GHz
Maximum Frequency4.1 GHz
NUMA Nodes2
PrefetchersL2 HW: Enabled, L2 Adj.: Enabled, DCU HW: Enabled, DCU IP: Enabled, AMP: Disabled, Homeless: Enabled, LLC: Enabled
PPINs 
Accelerators

DLB 2 [0], DSA 2[0], IAA 2 [0], QAT 2 [0]

Installed Memory

2048 GB (16x128 GB DDR5 5600 MT/s [5600 MT/s])

Hugepagesize

2048 kB

Transparent Hug Pages

madvise

Automatic NUMA Balancing

Enabled

NIC

4x NetXtreme BCM5720 Gigabit Ethernet PCIe, 2x Intel® Ethernet Controller E810-C for QSFP

Disk

1 x 447.1G Dell BOSS-N1, 6 x 3.5T Dell NVMe PM1743 RI E3.S 3.84 TB

BIOS

2.2.7

Microcode

0x21000240

OS

Ubuntu 24.04 LTS

Kernel

6.8.0-41-generic

TDP

300W

Power & Perf Policy

Normal (6)

Frequency Governor

schedutil

Frequency Driver

intel_cpufreq

Max C-State

9

Vulnerability

All OK

 

System Profile

System ProfileCurrent Value
Optimized Power ModeDisabled
CPU Power ManagementMaximum Performance
Memory FrequencyMaximum Performance
Turbo BoostEnabled
Energy Efficient TurboDisabled
C1EDisabled
C-StatesDisabled
Memory Patrol ScrubStandard
Memory Refresh Rate1x
Uncore FrequencyMaximum
Dynamic Load Line SwitchEnabled
Energy Efficient PolicyPerformance
Monitor/MwaitEnabled
Workload ProfileNot Configured
CPU Interconnect Bus Link Power ManagementDisabled
PCI ASPM L1 Link Power ManagementDisabled
Workload ConfigurationBalance

 

Software Configuration

WorkloadBenchmark
ApplicationIntel® Extension for PyTorch v2.4.0+cpu
Modelmeta-llama/Meta-Llama-3-8B
Middleware, Framework, RuntimesIntel® Extension for PyTorch v2.4.0+cpu
Containers and VirtualizationRun in Docker Container
Input Tokens32, 128, 1024, 2048
Output Tokens32

 

Why Use Intel® Extension for PyTorch?

Intel® Extension for PyTorch provides several benefits that make it an attractive choice for optimizing LLMs:

  1. Seamless Integration: IPEX integrates smoothly with existing PyTorch workflows, allowing developers to leverage its optimizations without extensive code modifications.
  2. Enhanced Performance: By utilizing advanced optimizations, IPEX boosts the performance of LLMs, reducing inference times and improving throughput.
  3. Scalability: IPEX is designed to scale efficiently across different hardware configurations, making it suitable for a wide range of applications and platforms.
  4. Ease of Use: With straightforward installation and configuration, IPEX enables developers to quickly enhance their models' performance.

Implementing IPEX on Dell PowerEdge R760

To take advantage of IPEX optimizations on Dell PowerEdge R760, follow these steps:

  1. Install Intel® Extension for PyTorch: Ensure you have the latest version of IPEX installed in your environment. You can find installation instructions in the official documentation.
  2. Integrate IPEX into Your Workflow: Modify your PyTorch scripts to incorporate IPEX optimizations. This typically involves importing IPEX and applying its optimizations to your model and data loaders.
  3. Benchmark and Optimize: Run benchmarks to measure the performance improvements achieved with IPEX. Use the results to fine-tune your model and optimize its performance further.

Conclusion

Intel® Extension for PyTorch offers a powerful and easy-to-use solution for enhancing the performance of LLMs on Dell platforms. By leveraging IPEX optimizations, developers can achieve significant speed-ups, making their AI applications more efficient and responsive. Whether you're working with Elasticsearch solutions or other AI workloads, IPEX provides the tools you need to unlock the full potential of your models.

For more information and to get started with IPEX, visit the Intel® Extension for PyTorch GitHub repository and explore the possibilities of optimized AI performance today!

 

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.