In the world of large language models (LLMs), performance optimization is key to unlocking the full potential of AI applications. Intel® Extension for PyTorch (IPEX) offers a powerful solution for enhancing the performance of LLMs on Dell platforms, particularly for Elasticsearch solutions. This blog post explores the performance gains achieved by using IPEX with the Llama 3 8B model, demonstrating how easy it is to boost your AI workloads.
Benchmarking Llama 3 8B with IPEX
The Llama 3 8B model, a state-of-the-art LLM, was benchmarked using Intel® Extension for PyTorch v2.4.0+cpu. The tests were conducted on a Dell PowerEdge R760 with 5th Gen Intel® Xeon® processors, focusing on Elasticsearch solutions. The benchmarking process involved averaging results from three runs, each consisting of 10 warm-up iterations followed by 100 iterations to ensure accuracy and consistency.
Key Findings
The benchmark results, as illustrated in the accompanying chart, highlight the significant performance improvements achieved with IPEX optimizations:
- Input Token Size 32: int8 IPEX achieved 3.54x and blfoat16 IPEX 1.97x speed up compared to the baseline bfloat16 performance.
- Input Token Size 128: int8 IPEX achieved 3.38x and blfoat16 IPEX 2.01x speed improvement compared to baseline bfloat16 performance.
- Input Token Size 1024: int8 IPEX achieved 3.30x and blfoat16 IPEX 2.49x speed up compared to the baseline bfloat16 performance.
- Input Token Size 2048: The highest speed-up of 3.47x with int8 IPEX was recorded, showcasing the effectiveness of IPEX for larger input sizes.
These results demonstrate that IPEX with Intel® Extension for PyTorch v2.4.0+cpu can significantly enhance the performance of LLMs, making it an invaluable tool for AI practitioners looking to optimize their models on Dell platforms.
Hardware Configurations
System | Dell PowerEdge R760 |
Baseboard | Dell 0VRV9X |
Chassis | Dell Inc Rack Mount Chassis |
CPU Model | Intel® Xeon® Platinum 8562Y+ |
Microarchitecture | EMR_MCC |
Sockets | 2 |
Cores per Socket | 32 |
Hyperthreading | Enabled |
CPUs | 128 |
Intel Turbo Boost | Enabled |
Base Frequency | 2.8 GHz |
All-Core Maximum Frequency | 3.8 GHz |
Maximum Frequency | 4.1 GHz |
NUMA Nodes | 2 |
Prefetchers | L2 HW: Enabled, L2 Adj.: Enabled, DCU HW: Enabled, DCU IP: Enabled, AMP: Disabled, Homeless: Enabled, LLC: Enabled |
PPINs | |
Accelerators | DLB 2 [0], DSA 2[0], IAA 2 [0], QAT 2 [0] |
Installed Memory | 2048 GB (16x128 GB DDR5 5600 MT/s [5600 MT/s]) |
Hugepagesize | 2048 kB |
Transparent Hug Pages | madvise |
Automatic NUMA Balancing | Enabled |
NIC | 4x NetXtreme BCM5720 Gigabit Ethernet PCIe, 2x Intel® Ethernet Controller E810-C for QSFP |
Disk | 1 x 447.1G Dell BOSS-N1, 6 x 3.5T Dell NVMe PM1743 RI E3.S 3.84 TB |
BIOS | 2.2.7 |
Microcode | 0x21000240 |
OS | Ubuntu 24.04 LTS |
Kernel | 6.8.0-41-generic |
TDP | 300W |
Power & Perf Policy | Normal (6) |
Frequency Governor | schedutil |
Frequency Driver | intel_cpufreq |
Max C-State | 9 |
Vulnerability | All OK |
System Profile
System Profile | Current Value |
Optimized Power Mode | Disabled |
CPU Power Management | Maximum Performance |
Memory Frequency | Maximum Performance |
Turbo Boost | Enabled |
Energy Efficient Turbo | Disabled |
C1E | Disabled |
C-States | Disabled |
Memory Patrol Scrub | Standard |
Memory Refresh Rate | 1x |
Uncore Frequency | Maximum |
Dynamic Load Line Switch | Enabled |
Energy Efficient Policy | Performance |
Monitor/Mwait | Enabled |
Workload Profile | Not Configured |
CPU Interconnect Bus Link Power Management | Disabled |
PCI ASPM L1 Link Power Management | Disabled |
Workload Configuration | Balance |
Software Configuration
Workload | Benchmark |
Application | Intel® Extension for PyTorch v2.4.0+cpu |
Model | meta-llama/Meta-Llama-3-8B |
Middleware, Framework, Runtimes | Intel® Extension for PyTorch v2.4.0+cpu |
Containers and Virtualization | Run in Docker Container |
Input Tokens | 32, 128, 1024, 2048 |
Output Tokens | 32 |
Why Use Intel® Extension for PyTorch?
Intel® Extension for PyTorch provides several benefits that make it an attractive choice for optimizing LLMs:
- Seamless Integration: IPEX integrates smoothly with existing PyTorch workflows, allowing developers to leverage its optimizations without extensive code modifications.
- Enhanced Performance: By utilizing advanced optimizations, IPEX boosts the performance of LLMs, reducing inference times and improving throughput.
- Scalability: IPEX is designed to scale efficiently across different hardware configurations, making it suitable for a wide range of applications and platforms.
- Ease of Use: With straightforward installation and configuration, IPEX enables developers to quickly enhance their models' performance.
Implementing IPEX on Dell PowerEdge R760
To take advantage of IPEX optimizations on Dell PowerEdge R760, follow these steps:
- Install Intel® Extension for PyTorch: Ensure you have the latest version of IPEX installed in your environment. You can find installation instructions in the official documentation.
- Integrate IPEX into Your Workflow: Modify your PyTorch scripts to incorporate IPEX optimizations. This typically involves importing IPEX and applying its optimizations to your model and data loaders.
- Benchmark and Optimize: Run benchmarks to measure the performance improvements achieved with IPEX. Use the results to fine-tune your model and optimize its performance further.
Conclusion
Intel® Extension for PyTorch offers a powerful and easy-to-use solution for enhancing the performance of LLMs on Dell platforms. By leveraging IPEX optimizations, developers can achieve significant speed-ups, making their AI applications more efficient and responsive. Whether you're working with Elasticsearch solutions or other AI workloads, IPEX provides the tools you need to unlock the full potential of your models.
For more information and to get started with IPEX, visit the Intel® Extension for PyTorch GitHub repository and explore the possibilities of optimized AI performance today!
Notices and Disclaimers
Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.