Solved: Haswell memory bandwidth

TPtac · ‎01-01-2017

Hi, Before measuring memory bandwidth with PCM, I think I need to understand the maximum (theoretical) memory bandwidth. I thought I had it figured out, but now I have a processor where I don't understand how the maximum numbers make sense. Here's an example I think I understand: Xeon E5-2630 v3 (Haswell-EP). The maximum memory bandwidth (according to ARK) is 59 GB/s. It has 4 memory channels and supports up to DDR4-1866 DIMMs. The peak transfer rate of a DDR4-1866 DIMM is 14933 MB/s, and 14933 * 4 = 59732 MB/s, so this adds up. What I don't understand: Xeon E7-4830 v3 (Haswell-EX). The maximum memory bandwidth is 102 GB/s. But it also supports up to DDR4-1866 and has 4 memory channels! So how does it get 102 GB/s? One theory is that the E7-4830 v3 has two memory controllers. While cpu-world confirms this, it also says that each controller has 2 memory channels, so it still doesn't add up. I'd appreciate any help from the experts over here. Is the number of memory controllers documented by Intel anywhere? I couldn't find it. Thanks in advance!

McCalpinJohn · ‎01-03-2017

The Xeon E7 processors use a buffer chip between the processor and the DIMMs. This buffer chip has two channels on the DIMM side and one interface on the processor side. Under some circumstances, the buffer-to-processor interface can run at 2x the frequency of the buffer-to-DIMM interface.

In this case the bandwidth comes from running the DIMMs at a slightly slower speed, which then allows the buffer-to-processor interface to run at the 2x rate. It looks like the bandwidth comes from:

Buffer-to-processor: 4 channels *(2*1.6 GT/s) * 8 B = 102.4 GB/s
Buffer-to-DIMM: 8 channels * 1.6 GT/s * 8B = 102.4 GB/s

View solution in original post

McCalpinJohn · ‎01-03-2017

The Xeon E7 processors use a buffer chip between the processor and the DIMMs. This buffer chip has two channels on the DIMM side and one interface on the processor side. Under some circumstances, the buffer-to-processor interface can run at 2x the frequency of the buffer-to-DIMM interface.

In this case the bandwidth comes from running the DIMMs at a slightly slower speed, which then allows the buffer-to-processor interface to run at the 2x rate. It looks like the bandwidth comes from:

Buffer-to-processor: 4 channels *(2*1.6 GT/s) * 8 B = 102.4 GB/s
Buffer-to-DIMM: 8 channels * 1.6 GT/s * 8B = 102.4 GB/s

TPtac · ‎01-03-2017

Hi John,

Thanks, that explains it! Do you know if the existence of this memory buffer documented anywhere? It looks like if you know it exists, you can Google some presentations and articles discussing it, but haven't really seen it mentioned in Intel datasheets or the optimization manuals.

Thomas_W_Intel · ‎01-04-2017

Yes, I agree that the memory buffers are often not discussed as prominently as other features of the platform. The datasheet of the memory buffer C112 and C114 is located here: http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/c112-c114-scalable-memory-buffer-datasheet.pdf

They are also listed on ark: http://ark.intel.com/products/series/99059/Intel-Scalable-Memory-Buffers