Intel Memory Latency Checker gives vastly different memory latency figures for different frequencies

futurewassomewhere · ‎03-06-2022

I am using Intel Memory Latency Checker v3.9a to get the idle memory latency for my system. But I have noticed that the tool gives vastly different values for different core clock frequencies. For example, it shows 86 ns idle latency for 2 GHz vs 54.5 ns for 4 GHz. Shouldn't the memory latency remain more or less same at different CPU frequencies? Am I misunderstanding something?

McCalpinJohn · ‎03-07-2022

Most of the latency in a memory access is in the frequency domains of the core (including private caches) and "uncore" (including shared cache and on-chip ring or mesh).

An old discussion that is still mostly relevant is at https://sites.utexas.edu/jdm4372/2011/03/10/memory-latency-components/

For the server processors it is possible to control the "core" and "uncore" frequency independently, with no restrictions on the ranges of either. With lots of measurements it is sometimes possible to generate an equation that accurately captures the latency in terms of core clocks, uncore clocks, and DRAM clocks.

For client processors my impression is that the uncore frequency is less easily controlled. According to volume 4 of the Intel SW Developer's Manual, in recent generations of Core processors, MSR 0x394 enables/disables the uncore fixed function (cycle) counter, while MSR 0x395 contains the 44-bit uncore cycle count.

In some processors (e.g, Sandy Bridge EP, Haswell EP), I observed that with the default settings the uncore frequency would match the highest core frequency. If that is the case on your system, then the numbers are not unreasonable. If we assume that there is one part of the latency that is a fixed number of ns on both systems and another part of the latency that is a fixed number of core cycles, then your observations correspond to a fixed latency of 23 ns plus a variable latency of 126 core cycles.

It is also possible that at the lower frequency the latency increases enough that the DRAM idle page timer activates and closes the DRAM page before the next access. This will increase the DRAM part of the latency by another 12 ns or so, and may partially account for the large difference you are seeing....