Breaking the Memory Wall with Compute Express Link (CXL)

Anil_Godbole · ‎05-03-2024

Increased data processing, widespread use of virtualization, and increased in-memory computing have exponentially increased demand for CPU-attached memory in servers. Modern workloads like AI, machine learning (ML), big data, and analytics intensify the memory challenges that data center managers face. Training large language models (LLMs) like GPT-4, Llama 2, and PaLM 2 requires large memory capacity and compute capabilities.

As processor core counts continue to rise, enabling faster and more complex computations, there is a need for more memory. CXL memory can provide needed expanded memory capacity. The latest Intel® Xeon® processors supporting both the latest-gen DDR and CXL memory provide the flexibility customers need to optimize the configuration to best meet their workload’s needs.

Additionally, DRAM costs in terms of dollars per gigabit ($/Gbit) are not decreasing with next-generation memory nodes as one might expect. Since memory can make up more than 50 percent of the cost of a server, there’s a huge opportunity to optimize costs and use memory resources efficiently.[1]

As a technical innovator, Intel has a successful history of developing new input/output (I/O), memory, and storage standards. In 2019, Intel announced the development of a new cache/memory coherent–interconnect protocol for processors, memory expansion, and accelerators called Compute Express Link (CXL). The CXL Consortium was formed the same year with members Alibaba, Cisco, Dell Technologies, Meta, Google, HPE, Huawei, and Microsoft. Today, just four years later, the consortium has more than 250 member companies and continues to grow.

CXL Overview

Like PCI Express (PCIe), CXL is a protocol for connecting devices to the CPU or any other compute element, such as a GPU. CXL also runs on the same physical layer links as PCIe, called “PHY links.” However, CXL differs from PCIe in that it enables coherent memory sharing with the attached device. One use case for CXL is to attach a coherent memory accelerator device. Another popular use case is simply to add more memory to a server. Clearly, CXL can be crucial in enhancing memory bandwidth and capacity.

Using PCIe’s high-speed communication capability, CXL significantly improves the data transfer rate between the CPU and connected devices, including memory expansion modules. The first generation of CXL links runs at the same speed as PCIe Gen5 at 32 GT/s (gigatransfers per second) or up to 64 gigabytes per second (GB/s) in each direction over a 16-lane link.[2]

Memory expansion

CXL provides a new way to increase a server’s memory capacity with CXL memory expanders. An application-specific integrated circuit (ASIC) or CXL controller manages the attached DRAM inside a CXL memory expander. It translates the incoming CXL commands into corresponding DRAM read/write commands.

All major memory suppliers are part of the expanding CXL ecosystem and have roadmaps to offer CXL Memory Expanders with up to 512 GB of DDR5 DRAM memory.[3] Data center managers can use CXL Memory Expanders to expand their server memory capacity to tens of terabytes while achieving memory bandwidths of several terabytes per second.

The operating system (OS) typically treats the added CXL memory as a second tier; native DRAM is the “near” memory, and CXL memory is the “far” memory. The Linux OS has evolved over the last few years to hide the latency differences between the two tiers. This is done by using a “hot/cold” page migration technique whereby frequently accessed data (“the hot page”) in CXL memory is promoted to near memory while simultaneously migrating “cold” pages from near memory to CXL memory. There is no need to modify the user application when using CXL memory expanders.

Memory TCO savings

Another benefit of using CXL to add memory is the possibility of using less expensive memory behind the CXL memory buffer ASIC. Micron provides 128 GB CXL modules using DDR4 memory, an older-generation DDR chip. Starting with Intel® Xeon® 6 processors, Intel plans to offer a unique hardware-controlled memory tiering feature called Intel® Flat Memory mode which migrates data between the two memory tiers without any dependency on the OS. (See “Orchestrating Memory Disaggregation with Compute Express Link (CXL).”)

Also, CXL v2.0 and later offer support for persistent memory. This is significant because CXL-based persistent memory is expected to be less expensive than DRAM. This memory is on the roadmap of select memory suppliers.

Memory bandwidth expansion

Adding CXL memory expands the total system memory bandwidth since it creates more channels for accessing data. On average, a x16 CXL link has 2x the bandwidth of a DDR5 memory channel. The system’s memory bandwidth can be further increased using “memory interleaving.” This can be a big boon to bandwidth-hungry workloads like those in the HPC and AI/ML domains.

The Linux OS is expected to offer this feature starting with version v6.9. The 5th Gen Intel® Xeon® processors offer Hetero Interleaving, a unique hardware-controlled memory interleaving feature. (See “Orchestrating Memory Disaggregation with Compute Express Link (CXL).”) Hardware-controlled memory interleaving can be very convenient for cloud service providers (CSPs) and OEMs as then their systems are not dependent on the OS for memory interleaving.

Memory pooling

Memory pooling allows sharing of memory resources across multiple devices within a computing system. It enables different accelerators—such as GPUs, FPGAs, and other specialized processors—to access and utilize a common pool of memory resources (typically DRAM) as if it were local memory. Though still in the proof-of-concept stage, CXL memory pooling will improve resource efficiency, save costs, enhance scalability and performance, and simplify programming.

Reap the benefits of CXL

Memory-intensive workloads like AI, virtual desktop infrastructure (VDI), and in-memory databases dominate the computing landscape today, but increasing memory capacity by adding CPU-attached DRAM is very expensive. The CXL protocol, running over existing PCIe links, allows for augmenting system memory at a lower cost and is supported by a broad consortium of technology companies and industry leaders.

To learn more about how CXL technology can serve your data center memory needs, visit the following resources:

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

[1] Intel. “Orchestrating memory disaggregation with Compute Express Link (CXL).” March 2024. https://www.intel.com/content/www/us/en/content-details/817889/orchestrating-memory-disaggregation-with-compute-express-link.html.

[2] Rambus. “Compute Express Link (CXL): All you need to know.” January 2024. https://www.rambus.com/blogs/compute-express-link/.

[3] Samsung. “Expanding the Limits of Memory Bandwidth and Density: Samsung’s CXL Memory Expander.” July 2022. https://semiconductor.samsung.com/news-events/tech-blog/expanding-the-limits-of-memory-bandwidth-and-density-samsungs-cxl-dram-memory-expander/.