Hello, I had originally posted this question at the Processors section of the forum but I believe here is more appropriate.
I have noticed that two of the systems I have access are exhibiting RAM latencies that are too high (500ns), but only when the system is idle. The result of this is the unprecedented behavior that an application's performance is enhanced if there is another application running concurrently in the system.
The latency for a given thread is better even when all the other cores in the system are performing memory accesses at maximum throughput than when only one said thread is running.
This can be observed while running the Intel® Memory Latency Checker loaded latency test.
For one of the systems, the output is this:
Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 380.24 68920.1 00002 380.22 68886.4 00008 380.38 68881.2 00015 380.11 68846.8 00050 376.15 68501.7 00100 372.86 68304.9 00200 282.19 69399.9 00300 115.94 51590.0 00400 96.13 39300.5 00500 92.62 31755.3 00700 87.91 23047.5 01000 85.88 16441.6 01300 84.75 12858.1 01700 83.99 10033.5 02500 83.41 7084.0 03500 85.19 5267.4 05000 92.23 3857.7 09000 122.13 2283.1 20000 219.69 1083.2
As expected, the latency descreases as the load from the rest of the system decreases, but when this load decreases below a certain point, the latency increases back. When the load is close to nonexistent, the latency is at its highest.
I also observed this high latency when running my own custom microbenchmarks. On these benchmarks, I noticed that the latency for fully random accesses that cross pages is almost one microsecond.
This happens in two of the dual-socket Xeon systems which I have access (4114 and 4214). Both these systems are from the same supplier, and I can't observe this in the other systems which were bought through other vendors.
I've attached the full outputs for the latency checker (mlc) and dmidecode.
Does someone have a clue of what's going on here? I guess it's a configuration issue but I don't have access to the BIOS of these systems at the moment and can't check memory timings. It feels like the memory goes to sleep after one access and if there is more accesses happening it stays awake, but I'm not aware of what may be causing this behaviour.
From the dmidecode output, this system is very poorly configured for memory bandwidth. Each of the two sockets supports 6 DDR4 DRAM channels, but in this system only 2 of the 6 channels in each socket have installed DRAM. This limits the system to approximately 1/3 of the potential memory bandwidth.
The specific observation of high latency at low load looks like the sort of result I have seen from overly aggressive power-saving settings. I have never tried to dig into the details in such cases -- fiddling with BIOS energy-saving and performance options has always been enough to make it go away....
The MLC output shows ridiculous remote memory latency and ridiculous remote HitM latency when data is homed in the Writer's socket, but very reasonable remote HitM latency when data is homed in the Reader's socket (essentially identical to what I see on a 2-socket Xeon Gold 6142 or a 2s Xeon Gold 5120 system).
I supposed it is possible that at least part of the weird behavior is due to the partial memory configuration?