Data Center
Participate in insightful discussions regarding Data Center topics
79 Discussions

Improve your HPC and AI workload performance by increasing server memory bandwidth with CXL memory

Anil_Godbole
Employee
0 0 1,613

High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads typically demand substantial memory bandwidth and, to a degree, memory capacity. CXL™ memory expansion modules, also known as CXL “type-3” devices, enable enhancements in both memory capacity and bandwidth for server systems by utilizing the CXL protocol which runs over the PCIe interfaces of the processor.

This paper discusses experimental findings on achieving increased memory bandwidth for HPC and AI workloads using Micron’s CXL modules. To my knowledge, this is the first study that presents real data experiments utilizing eight CXL E3.S (x8) Micron CZ122 devices on the Intel® Xeon® 6 processor 6900P (previously codenamed Granite Rapids AP) featuring 128 cores, alongside Micron DDR-5 memory operating at 6400 MT/s on each of the CPU’s 12 DRAM channels.

The eight CXL memories were set up as a unified NUMA configuration, employing a software-based page-level interleaving mechanism, available Linux kernel v6.9+, between DDR5 and CXL memory nodes to improve overall system bandwidth. The memory bandwidth expansion enabled by CXL is essential for enhancing the performance of High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads. Memory expansion via CXL boosts read-only bandwidth by 24% and mixed read/write bandwidth by up to 38%. Across HPC and AI workloads, the geometric mean of performance speedups is 24%.

Read the white paper for more details.