Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Idle latency difference between windows and linux

Pradeep_R_
Beginner
656 Views

Hi all,

I have a server with dual-socket E5-2699 v4 that has windows 2012R2 and Centos 6.6 installed on it on dual-boot. I was using MLC to measure idle latency on this system and I notice that the idle latency to access DDR is lower in linux than on windows! For same-socket access, it is lower by around 6ns, while on cross-socket, it is lower by around 10ns!

Is the resolution of MLC accurate enough to trust this nanosecond difference?

If so, is this higher latency on windows expected? I disabled huge_pages in linux to see if that were the problem, but the latencies on linux remain identical in that case too.

Thanks for your help in advance!

Pradeep.

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
656 Views

The MLC measurements are certainly accurate enough for 6 ns or 10 ns differences to be significant.

The MLC idle measurement latency depends on being able to disable the HW prefetchers --- adding the "-e" option will skip this step and will give much lower numbers because the data is accessed using a prefetchable pattern.   I know this works on Linux, so if the Windows numbers are higher, it is probably working properly there too.

For the Xeon E5-2699 v4 there are at least 4 different snoop modes.  In several of these modes the local and/or remote latency will depend on whether C1E state is enabled in the BIOS and whether the OS keeps a thread running on each socket to prevent the C1E state from being entered.  Linux and Windows could certainly differ in this regard.   I don't see anything in the MLC documentation about whether it runs a thread on the alternate socket during latency tests.   On my older Xeon E5-2680 (Sandy Bridge EP) processors, if the other socket drops into C1E state the *local* latency increases by about 11 ns and the *remote* latency increases by a larger amount.  I have not tested this on more recent systems.

The OS can also have an influence on the core and uncore frequencies both by direct management of the frequencies and by indirect management of "energy performance bias" settings.   The default settings could easily be different between Linux and Windows.

0 Kudos
Reply