Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Intel Memory Latency checker w/ Windows support released

Krishnaswa_V_Intel
738 Views

We just released v2.3 of Intel Memory Latency checker (http://www.intel.com/software/mlc). This adds support for Windows o/s while previous versions already supported Linux o/s. In addition, single socket Xeon processors (E3) are also supported. 

Intel Memory Latency checker can be used to measure latencies and bandwidth on Intel Xeon processors

Vish

0 Kudos
4 Replies
McCalpinJohn
Honored Contributor III
738 Views

I am seeing strange latency results with version 2.3.   In particular the result of the "--idle_latency" test (with no other options) is much higher than any of the values from the "--latency_matrix".    If the "--idle_latency" test is given "-c" and "-i" options to place the threads and data the results seem fine.

E.g., on a Xeon E5-2680 with cores 0-7 in socket 0 and 8-15 in socket 1, I see:

c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:28:08 $ ./mlc_2-3 --latency_matrix
Intel(R) Memory Latency Checker - v2.3
Command line parameters: --latency_matrix 

Using buffer size of 200.000MB
Measuring idle latencies (in ns)...
	Memory node
Socket	     0	     1	
     0	  66.9	 116.6	
     1	 116.6	  66.9	


c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:27:54 $ ./mlc_2-3 --idle_latency 
Intel(R) Memory Latency Checker - v2.3
Command line parameters: --idle_latency 

Using buffer size of 200.000MB
Each iteration took 362.0 core clocks (	134.1	ns)


c557-603:~/Stampede/IntelMemoryLatencyChecker:2014-11-11T13:27:24 $ ./mlc_2-3 --idle_latency -c4 -i4
Intel(R) Memory Latency Checker - v2.3
Command line parameters: --idle_latency -c4 -i4 

Using buffer size of 200.000MB
Each iteration took 179.5 core clocks (	66.5	ns)

 

0 Kudos
Krishnaswa_V_Intel
738 Views

John,

Thanks for reporting this issue. We moved the dummy threads to 1st cpu in each socket with this release. When you invoke --idle_latency without any parameters, we ended up running both dummy thread and the latency thread on the same core resulting in higher latencies. We missed this case in testing as we typically expect -c option to be specified when --idle_latency is used. We do take care not to schedule both dummy threads and measurement threads on the same cpu but missed this one case.  I fixed the code to handle this case and the next release should include the fix.

Are you seeing any other issues? We really appreciate your testing and feedback to make the tool better

Thanks

Vish

0 Kudos
McCalpinJohn
Honored Contributor III
738 Views

I though that might be the problem -- I tried using the "-c" option to bind to each core on each socket and saw no degradation on any of the cores, but it looks like specifying the core activated the "collision avoidance" logic.

There are some funny numbers on one of my Haswell 2-socket boxes, but it looks like the DRAM configuration is not optimal for this 3-channel processor (Xeon E5-2603 v3).  

The rest of the results look good -- thanks for supporting this tool!

0 Kudos
zeuch__Steffen
Beginner
738 Views

Hello,

the documentation of the Intel Memory Latency Checker states that with the option -bXXX you can specify the buffer size. For example to measure caches instead of DRAM. But this option will not considered for execution. The print message "Using buffer size of" as well as the measures values indicate that it not works. For example mlc --idle_latency –b3000 –c0 –t3 out of the documentation will not work. Is there a workaround?

Kind regards,
Steffen

0 Kudos
Reply