We just released v2.3 of Intel Memory Latency checker (http://www.intel.com/software/mlc). This adds support for Windows o/s while previous versions already supported Linux o/s. In addition, single socket Xeon processors (E3) are also supported.
Intel Memory Latency checker can be used to measure latencies and bandwidth on Intel Xeon processors
I am seeing strange latency results with version 2.3. In particular the result of the "--idle_latency" test (with no other options) is much higher than any of the values from the "--latency_matrix". If the "--idle_latency" test is given "-c" and "-i" options to place the threads and data the results seem fine.
E.g., on a Xeon E5-2680 with cores 0-7 in socket 0 and 8-15 in socket 1, I see:
c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:28:08 $ ./mlc_2-3 --latency_matrix Intel(R) Memory Latency Checker - v2.3 Command line parameters: --latency_matrix Using buffer size of 200.000MB Measuring idle latencies (in ns)... Memory node Socket 0 1 0 66.9 116.6 1 116.6 66.9 c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:27:54 $ ./mlc_2-3 --idle_latency Intel(R) Memory Latency Checker - v2.3 Command line parameters: --idle_latency Using buffer size of 200.000MB Each iteration took 362.0 core clocks ( 134.1 ns) c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:27:24 $ ./mlc_2-3 --idle_latency -c4 -i4 Intel(R) Memory Latency Checker - v2.3 Command line parameters: --idle_latency -c4 -i4 Using buffer size of 200.000MB Each iteration took 179.5 core clocks ( 66.5 ns)
Thanks for reporting this issue. We moved the dummy threads to 1st cpu in each socket with this release. When you invoke --idle_latency without any parameters, we ended up running both dummy thread and the latency thread on the same core resulting in higher latencies. We missed this case in testing as we typically expect -c option to be specified when --idle_latency is used. We do take care not to schedule both dummy threads and measurement threads on the same cpu but missed this one case. I fixed the code to handle this case and the next release should include the fix.
Are you seeing any other issues? We really appreciate your testing and feedback to make the tool better
I though that might be the problem -- I tried using the "-c" option to bind to each core on each socket and saw no degradation on any of the cores, but it looks like specifying the core activated the "collision avoidance" logic.
There are some funny numbers on one of my Haswell 2-socket boxes, but it looks like the DRAM configuration is not optimal for this 3-channel processor (Xeon E5-2603 v3).
The rest of the results look good -- thanks for supporting this tool!
the documentation of the Intel Memory Latency Checker states that with the option -bXXX you can specify the buffer size. For example to measure caches instead of DRAM. But this option will not considered for execution. The print message "Using buffer size of" as well as the measures values indicate that it not works. For example mlc --idle_latency –b3000 –c0 –t3 out of the documentation will not work. Is there a workaround?