Author: Jimmy Leon, Engineering Manager, Intel
Contributors:
Joel Schuetze, Software Enabling and Optimization Engineer, Intel
Brian Will, Principal Engineer, Intel
Sabarinath Mukundu Subramanian, Software Engineer, HCL Technologies
Data compression is an essential task performed by data center central processing units (CPUs) to save storage space, reduce bandwidth and transfer times over networks, and create more efficient backups and file archives. However, compression tasks can consume significant CPU cycles, potentially limiting the system’s ability to perform other concurrent workloads. Intel® QuickAssist Technology (Intel® QAT), an integrated workload acceleration feature offered in 4th Gen Intel® Xeon® processors through the latest Intel Xeon 6 series, accelerates data compression, decompression, and cryptographic operations by offloading these compute-intensive tasks from the CPU cores. This can reduce CPU overhead, enhancing overall system performance and efficiency, and minimizing the impact on other critical workloads.
Common compression methods, such as gzip, bzip2, XZ, ZIP, and Zstandard (zstd), employ different algorithms offering varying levels of compression efficiency and speed. Compression requires the underlying algorithms to analyze data to detect and eliminate redundancy. Many lossless compression algorithms are based on a two-pass methodology. First, the LZ77 algorithm identifies and replaces repetition with references to past data. In the second pass, encoding transforms bytes into a more compact bit representation based on the frequency of symbols encountered. This process of traversing, identification, replacing, and encoding data demands substantial computational resources.
Comparative Analysis with AMD EPYC Processors
We evaluated the compression performance of Intel Xeon 6 compared to 5th Gen AMD EPYC processors using the Zstandard algorithm. This fast, lossless data compression algorithm offers a range of compression speeds and ratios for real-time applications and general data storage. For Intel processors, we utilize the Intel QAT ZSTD Plugin, which integrates with zstd to offload the LZ77 compression phase (repetition discovery and replacement). For AMD processors, we use the AOCL-Compression framework, which is used with AMD Zen-based CPUs.
Testing was performed using the Silesia corpus, a standard compression benchmark dataset consisting of 211MB of mixed file types (text, XML, images, executables, and binary data) combined into a single file. The dataset was processed using 64KB chunks to optimize performance across both platforms. Two compression scenarios were tested normalizing on compression ratios between the Intel QAT ZSTD Plugin and AMD AOCL-Compression to ensure fair performance comparisons (the compression ratio directly impacts both storage efficiency and processing requirements in real-world deployments).
We employed the following two test scenarios for compression:
- Speed-optimized compression test. Target: 34% compression ratio optimized for faster processing. Intel QAT ZSTD Plugin: Level 1. AMD AOCL-Compression: Level 5. Use case: Real-time compression where throughput is prioritized.
- Space-optimized compression test. Target: 32% compression ratio optimized for better space savings. Intel QAT ZSTD Plugin: Level 12. AMD AOCL-Compression: Level 11. Use case: Archival storage where compression efficiency is prioritized.
When using zstd with the Intel QAT ZSTD Plugin and AMD AOCL-Compression, higher compression levels map to better compression ratio and slower throughputs, while lower compression levels prioritize higher throughputs, trading off compression ratio. Figure 1 shows compression bandwidth performance under the speed-optimized compression test scenario using 2-socket Intel Xeon 6980P 128-core processors (256 total cores) with eight Intel QAT hardware accelerators compared to the 2-socket AMD EPYC 9755 128-core processors (256 total cores) and the 2-socket AMD EPYC 9965 192-core processors (384 total cores).
The Intel Xeon 6980P platform with eight Intel QAT hardware devices achieved maximum performance of 40,840 MB/s while utilizing only 80 cores, leaving 176 cores available for other workloads. Intel Xeon 6980P outperformed AMD EPYC 9755 by up to 1.31x and AMD EPYC 9965 by up to 1.35x at a system level, while maintaining approximately 34% compression ratio. AMD EPYC 9755 achieved maximum performance using 224 cores, leaving only 32 cores available. AMD EPYC 9965 achieved maximum performance using 232 cores, leaving 152 cores available for other workloads.
For another view of the performance, when using 80 cores in each system, Intel Xeon 6980P delivers up to 2.7x better performance for zstd compression compared to AMD EPYC 9755 and AMD EPYC 9955.
Figure 1: In the speed-optimized compression test scenario, Intel Xeon 6980P outperformed AMD EPYC 9755 byup to 1.31x and AMD EPYC 9965 by up to 1.35x.
Figure 2 shows the comparison of the compression bandwidth performance under the space-optimized compression scenario using the same processor configurations. The Intel Xeon 6980P platform with eight Intel QAT hardware devices achieved maximum performance of 24,093 MB/s while utilizing 56 cores, leaving 200 cores available for other workloads. Intel Xeon 6980P outperformed AMD EPYC 9755 by up to 2.17x and AMD EPYC 9965 by up to 1.86x. Both AMD EPYC processors utilized all of their available cores to achieve their maximum throughput, leaving zero cores available for other workloads. With 56 cores of Intel Xeon 6980P, the system outperforms 256 cores of AMD EPYC 9755 and 384 cores of AMD EPYC 9955. To deliver the same performance as Intel QAT in Intel Xeon 6980P, you would need an additional servers for either AMD platform.
Figure 2: In the space-optimized compression test scenario, Intel Xeon 6980P outperformed AMD EPYC 9755 by up to 2.17x and AMD EPYC 9965 by up to 1.86x.
In Summary, when optimizing for compression speed, the dual socket 128C Intel Xeon 6980P with 8 QAT devices outperformed the dual socket 128C AMD 9755 by up to 1.31x utilizing 64% fewer cores and the dual socket 192C AMD 9967 by 1.35x utilizing 65% fewer cores. When optimizing for compression ratio, the dual socket 128C Intel Xeon 6980P with 8 QAT devices outperformed the dual socket 128C AMD 9755 by up to 2.17x utilizing 78% fewer cores and the dual socket 192C AMD 9965 by up to 1.86x utilizing 85% fewer cores. This offloading allows the Intel Xeon 6 processor to focus on its primary workload, which can result in higher overall system performance, increased power efficiency, faster backup/restore operations, and greater storage savings.
Putting Intel QAT Compression to Work to Reduce CPU Overhead
Intel QAT's lossless compression acceleration is most valuable for workloads that process massive amounts of structured or text data continuously while requiring perfect data integrity.
Here are ways you can put Intel QAT compression to work in your business:
Storage Systems
- All-flash arrays: Use lossless compression for databases, file systems, and application data.
- Backup/archival systems: Preserve data integrity exactly while processing terabytes daily.
- Distributed storage (Ceph and more): Compress file system data, database blocks, and VM images.
- File servers/network attached storage (NAS): Compress documents, code repositories, and structured files.
Database Systems
- Online transaction processing (OLTP) databases: Compress table data, indexes, and transaction logs without any data loss.
- Analytics databases: Compress structured data while helping to preserve exact values for queries.
- Time-series databases: Handle massive Internet of Things (IoT)/monitoring data streams with precision.
Web/Network Infrastructure
- Web servers: Compress text-based content (HTML, CSS, JavaScript, JSON, XML) on-the-fly.
- Application programming interface (API) gateways: Compress JSON/XML responses for millions of requests.
- Proxy/cache servers: Compress cacheable text content.
- Wide area network (WAN) optimization: Compress data traversing expensive network links (text protocols).
Big Data and Analytics
- Log aggregation: Compress text logs and JSON events from distributed systems.
- Data pipelines: Compress structured data between processing stages.
- Archive systems: Long-term storage of business-critical data.
- Version control systems: Compress code repositories and development assets.
Email and Communication Systems
- Email servers: Compress mailbox data and document attachments.
- Document management: Compress office files, PDFs, and structured documents.
Intel Xeon 6 has Intel QAT built into its architecture, and enabling this feature is simple. Learn how to get started using Intel QAT for compression.
Product and Performance Information
Hardware Configurations
Intel Xeon 6980P: 1-node, 2x Intel(R) Xeon(R) 6980P 128-Core Processor, 128 cores, 500W TDP, HT On, Turbo On, Total Memory 1536GB (24x64GB DDR5 6400 MT/s [6400 MT/s]), 8 Total QAT devices, microcode 0x10003a5, 2x Ethernet Controller X710 for 10GBASE-T, 2x Ethernet Controller E810-C for QSFP, 1x 1.7T SAMSUNG MZWLJ1T9HBJR-00007, Ubuntu 24.04 LTS, 6.8.0-78-generic. Test by Intel as of September 2025.
AMD EPYC 9755: 1-node, 2x AMD EPYC 9755 128-Core Processor, 128 cores, 500W TDP, SMT On, Boost On, Total Memory 1536GB (24x64GB DDR5 6400MT/s [6000MT/s]), microcode 0xb002116, 2x Ethernet Controller E810-C for QSFP, 2x Ethernet Controller X710 for 10GBASE-T, 1x 1.7T Micron_7450_MTFDKBG1T9TFR, Ubuntu 24.04 LTS, 6.8.0-78-generic. Test by Intel as of September 2025.
AMD EPYC 9965: 1-node, 2x AMD EPYC 9965 192-Core Processor, 192 cores, 500W TDP, SMT On, Boost On, Total Memory 1536GB (24x64GB DDR5 6400 MT/s [6000 MT/s]), microcode 0xb101021, 2x Ethernet Controller E810-C for QSFP, 2x Ethernet Controller 10-Gigabit X540-AT2, 1x 3.5T SAMSUNG MZWLJ3T8HBLS-00007, Ubuntu 24.04 LTS, 6.8.0-79-generic. Test by Intel as September 2025.
Software Configurations
Intel Xeon: Zstd v1.5.5, Intel QAT driver QAT20.L.1.2.30-00090, Silesia corpus dataset, chunk size 64K, 50 loops; Intel QAT ZSTD Plugin, taskset -c <XX> benchmark -t <XX> -l<XX> -c64K -E0 -L <XX> -m1 silesia.concat
AMD EPYC: Zstd v1.5.5, AOCC compiler v5.0, AOCL-Compression v5.1, Silesia corpus dataset, chunk size 64K, 50 loops; AOCL-Compression measured with Intel QAT ZSTD Plugin benchmark.c3, taskset -c <XX> benchmark -t <XX> -l<XX> -c64K -E0 -L <XX> -m0 silesia.concat
t = threads
l = loops
c = chunk size
L = compression level
m = 0 for sw (1 for qathw)
E = search for ExternalRepcode set to default(auto)
Notices and Disclaimers
Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.