A look into VMware's production Tanzu on Intel Optane tiered memory

John_Hubbard · ‎05-10-2023

Understanding Memory Utilization Patterns Can Be a Game Changer for Containerized Environments

Cloud-native. Microservices. DevOps. Kubernetes. These are the buzzwords – and technologies—that are driving the growth of containerized environments, both on-premises and in the cloud. But while some things are changing, others haven’t changed a bit. For example, whether it’s an old-school VM or a shiny new container, you still need to provide enough compute and memory resources so that your workloads perform as required. Allocate too little, and you risk slowdowns (or crashes). But overprovisioning isn’t a good idea either.

Historically, we didn’t usually talk about how memory was being used – that is, its usage characteristics and behaviors. Back in the day, a discussion of memory might have revolved around two simple concepts: total memory and free memory. But there’s more to it in the modern data center. Do you know what your memory is doing? Are there tools to help? What opportunities for server consolidation and cost savings are being wasted?

Let’s find out.

Revisiting Computing Concepts

The advent of the first hypervisor (from VMware) was an initial step toward a better understanding of memory utilization. This middle layer between the virtual hardware and the physical hardware makes decisions about whether a VM can use a resource or not. As hypervisors matured, we data center nerds started taking notes to track memory requests. As with any endeavor, if we keep our ears and eyes open, we can learn things. What we learned over time is that there are other important characteristics to memory. It’s not just free/not free. There’s also “active” memory, which is the number of bytes that on a regular basis – every few seconds – are actively being changed. For many workloads, active memory is much smaller than “consumed memory,” which is the modern term for “not free” memory.

Let’s take a step back and answer a basic question. Why do we have memory in our servers, anyway?

In the beginning, memory was painstakingly manually loaded. Storage, if you can call it that, was made possible via punch cards. Thankfully, many years later, magnetic storage technologies ramped up to support both tape and floppy disks. Floppy disks then graduated to hard disks and “permanent” digital storage was born. In the last decade or so, solid-state technology gained ground for its amazing performance, limited primarily by cost, followed by capacity. Phase change memory is the latest advancement, which fuels Intel® Optane™ technology.

From megabytes to terabytes, the computing model has not changed, loading data from a slower technology to a faster technology. In a “perfect world,” memory would be so plentiful that we wouldn’t need storage. Too bad memory is so much more expensive. At the time of time of this writing, a client 2 TB M.2 SSD is only about $130 USD.[1] 2 TB of DRAM is… more expensive.

DRAM is no longer the only type of memory. Intel Optane persistent memory (PMem) is a unique media with characteristics of both storage and memory. You can use Intel Optane PMem to build a tiered memory system (sort of like the two-tiered storage area network you’re already familiar with).

Tiered memory uses a relatively small amount of DRAM (for example, 512 GB) as a memory cache, and uses a different type of memory, like Intel Optane PMem, as main system memory. The advantage of a tiered memory system is that it’s capable of far more capacity (think TBs) per server for a fraction of the cost of a DRAM-only system. And VMs and containers never know the difference – the tiering is invisible to the applications and the OS. And yes… “tiering” is a bit of a misnomer, as we’re discussing caching, but add marketing to the equation and here we are.

Capacity Planning

Capacity planning for a tiered memory system may seem daunting, but tools from VMware eliminate all the guesswork. The active and consumed memory metrics in vCenter are your friends. These will reveal how much of the memory you currently have is actually active, versus just “in use” or consumed.

For those of you on vSphere 7.0U3 and above, you have access to additional metrics. Intel recommends leveraging vSphere Memory Monitoring and Remediation (vMMR) to characterize your hardware environment. You can use the information to decide which workloads are good candidates for tiered memory. Here’s a summary of useful metrics:

Active memory. The hottest data. The size of active memory determines what size memory cache you need.
Consumed memory. Memory that is not free. The size represents the minimum amount of memory needed to support existing workloads.
Bandwidth. The speedometer. You probably know the factory specification bandwidth for your installed memory. But just because you’re driving on a highway that has a speed limit of 70 mph doesn’t mean all cars are traveling that fast. vMMR can help identify how much bandwidth is being used. For tiered memory systems, the bandwidth of each tier is available.
Cache miss rate. For tiered memory systems, the cache miss rate tells you if you sized the cache correctly. A cache miss rate higher than 10% typically indicates you need a larger cache.
CPU utilization. While not a direct memory measurement, if your CPUs are underutilized, server consolidation may be possible.

So, you’re thinking… all this sounds well and good… but is it real? I assure you – it is. Just recently, VMware characterized memory utilization in its own on-premises Tanzu container environment and determined that because active memory was a small percentage (only about 10%) of consumed memory, a tiered memory system was a good fit. They replaced 27 legacy servers with just NINE servers powered by 3rd Gen Intel® Xeon® Scalable processors and a tiered memory system that has 4 TB of system memory (Intel Optane PMem) per server. That’s a 66% server consolidation and more than 10x greater memory capacity for resource-hungry containerized workloads, while simultaneously reducing memory costs by up to 33%.[2]

If you’d like to achieve this sort of cost savings and server consolidation in your data center, I encourage you to read the white paper, “Tiered Memory in VMware’s Production Tanzu Environment.”

[1] Pricing as of April 25, 2023. Source: https://www.amazon.com/Crucial-Plus-PCIe-NAND-6600MB/dp/B098WKQRDL?th=1.

[2] Performance varies by use, configuration, and other factors. Learn more at intel.com/PerformanceIndex. Performance results are based on testing by VMware as of October 28, 2022 and may not reflect all publicly available security updates. Intel technologies may require enabled hardware, software, or service activation. See configuration disclosures for details. No product or component can be absolutely secure. Your costs and results may vary.

Legacy cluster: 27x Dell PowerEdge M630 Blade Servers, each with 2x Intel® Xeon® processor E5-2680v4 (32 cores, 2.4/3.3 GHz), 384 GB memory (12x 32 GB DRAM) – 4 channels/socket @ 2400 MT/s, SAN storage = QLogic QME2662 16 Gbps Fibre Channel, network = Intel® Ethernet Server Adapter X520 Dual Port 10 GbE.

Modernized cluster: 9x Dell PowerEdge R650 (rack 1U), each with 2x Intel Xeon Gold 6338 processor (32 cores, 2.0/3.2 GHz), 4 TB memory (16x 64 GB DRAM + 16x 256 GB Intel® Optane® persistent memory) – 8 channels/socket @ 3200 MT/s, vSAN storage = 1x Intel Optane SSD P5800X 800 GB + 4x Solidigm P5500 3.84 TB, network = Intel® Ethernet Network Adapter E810-XXVDA2 Dual Port 10/25 GbE.

Pricing information obtained from Intel’s Intel® Optane™ PMem TCO Calculator as of February 3, 2023:

DRAM-based solution: DDR4 DRAM cost estimate based on list pricing for DDR4 DIMMs integrated in OEM systems captured on February 3, 2023. DRAM-only memory subsystem cost: $135,849.

Default option uses an average price across 3 OEMs: HPE list price, Lenovo list price, Dell list price.

Intel® Optane™ Persistent Memory and CPU Pricing are based on Intel RCP available on www.ark.intel.com. Tiered memory cost: $90,810.

Intel® Optane™ PMem pricing shown is provided for guidance and planning purposes only and does not constitute a final offer. Pricing guidance is subject to change and may revise up or down based on market dynamics. Please contact your OEM/Distributor for actual pricing.

Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. © Intel Corporation