Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Intel Memory Latency Checker v2.0 released

Thomas_W_Intel
Employee
2,571 Views

A new version of Intel Memory Latency Checker v2.0 (Intel MLC) has recently been posted at http://www.intel.com/software/mlc

Apart from the unloaded memory latency, Intel MLC can now measure memory bandwidth and loaded latencies as well.

0 Kudos
24 Replies
Krishnaswa_V_Intel
382 Views

We just released Ver 2.1 of Intel Memory Latency Checker tool. This version automatically launches spinner threads while doing b/w tests to ensure best possible memory b/w for remote accesses. Also, it takes care of measuring remote memory latencies properly on newer Linux kernels where NUMA balancing feature is enabled. Please give this a try and let us know if you have any feedback. - Thanks

0 Kudos
Ming_C_
Beginner
382 Views

It is a great tool but I got some strange results when I ran it on a 2-socket E5-2670 system. The idle latencies were not consistent between socket 0 and 1 though the memory bandwidth test looked fine. What could be the possible cause of inconsistent result in idle latencies between socket 0 and 1 ? Below is the result.

[root@localhost mlc]# ./mlc
Intel(R) Memory Latency Checker - v2.1

Using buffer size of 200.000MB
Measuring idle latencies (in ns)...
        Memory node
Socket       0       1
     0    88.4   150.8
     1    13.2     9.6

Measuring Peak Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using traffic with the following read-write ratios
ALL Reads        :      40054.3
3:1 Reads-Writes :      34445.4
2:1 Reads-Writes :      33157.9
1:1 Reads-Writes :      30032.6
Stream-triad like:      32967.6

Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)

Using Read-only traffic type
        Memory node
 Socket      0       1
     0  20022.8 11494.4
     1  11724.8 20030.0

Thanks!

 
0 Kudos
Krishnaswa_V_Intel
382 Views

Hi Ming, can you please try the following 2 commands and send me the output

./mlc --latency_matrix –r –l128 

./mlc --latency_matrix -l128 -v

0 Kudos
Ming_C_
Beginner
382 Views

Hi Vish,

Here is the result for the suggested commands. It seems the 1st command produced consistent latency numbers between socket 0 and 1.

[root@localhost mlc]# ./mlc --latency_matrix -r -l128
Intel(R) Memory Latency Checker - v2.1
Command line parameters: --latency_matrix -r -l128

Using buffer size of 200.000MB
Measuring idle latencies (in ns)...
        Memory node
Socket       0       1
     0    88.7   151.4
     1   151.0    88.5
[root@localhost mlc]# ./mlc --latency_matrix -l128 -v
Intel(R) Memory Latency Checker - v2.1
Command line parameters: --latency_matrix -l128 -v
OS core id:   0: Socket id:   0 Hyperthread id:   0
OS core id:   1: Socket id:   0 Hyperthread id:   1
OS core id:   2: Socket id:   0 Hyperthread id:   0
OS core id:   3: Socket id:   0 Hyperthread id:   1
OS core id:   4: Socket id:   0 Hyperthread id:   0
OS core id:   5: Socket id:   0 Hyperthread id:   1
OS core id:   6: Socket id:   0 Hyperthread id:   0
OS core id:   7: Socket id:   0 Hyperthread id:   1
OS core id:   8: Socket id:   0 Hyperthread id:   0
OS core id:   9: Socket id:   0 Hyperthread id:   1
OS core id:  10: Socket id:   0 Hyperthread id:   0
OS core id:  11: Socket id:   0 Hyperthread id:   1
OS core id:  12: Socket id:   0 Hyperthread id:   0
OS core id:  13: Socket id:   0 Hyperthread id:   1
OS core id:  14: Socket id:   0 Hyperthread id:   0
OS core id:  15: Socket id:   0 Hyperthread id:   1
OS core id:  16: Socket id:   1 Hyperthread id:   0
OS core id:  17: Socket id:   1 Hyperthread id:   1
OS core id:  18: Socket id:   1 Hyperthread id:   0
OS core id:  19: Socket id:   1 Hyperthread id:   1
OS core id:  20: Socket id:   1 Hyperthread id:   0
OS core id:  21: Socket id:   1 Hyperthread id:   1
OS core id:  22: Socket id:   1 Hyperthread id:   0
OS core id:  23: Socket id:   1 Hyperthread id:   1
OS core id:  24: Socket id:   1 Hyperthread id:   0
OS core id:  25: Socket id:   1 Hyperthread id:   1
OS core id:  26: Socket id:   1 Hyperthread id:   0
OS core id:  27: Socket id:   1 Hyperthread id:   1
OS core id:  28: Socket id:   1 Hyperthread id:   0
OS core id:  29: Socket id:   1 Hyperthread id:   1
OS core id:  30: Socket id:   1 Hyperthread id:   0
OS core id:  31: Socket id:   1 Hyperthread id:   1
Detected 2 sockets

Using buffer size of 200.000MB
Test running on 2600.00 MHZ processor(s)
Core 20 is running a busy loop to keep socket 1 from low frequency states
Core 4 is running a busy loop to keep socket 0 from low frequency states
Measuring idle latencies (in ns)...

Socket  0 (core   2) measuring latency to memory on socket  0 (allocated by core   2)..
Allocated 1600000 cache lines...
Initializing memory...memory initialized
Start loop for latency measurement...
Each iteration took 230.6 core clocks ( 88.7    ns)

Socket  0 (core   2) measuring latency to memory on socket  1 (allocated by core  18)..
Allocated 1600000 cache lines...
Initializing memory...memory initialized
Start loop for latency measurement...
Each iteration took 393.7 core clocks ( 151.4   ns)


Socket  1 (core  18) measuring latency to memory on socket  0 (allocated by core   2)..
Allocated 1600000 cache lines...
Initializing memory...memory initialized
Start loop for latency measurement...
Each iteration took 110.7 core clocks ( 42.6    ns)

Socket  1 (core  18) measuring latency to memory on socket  1 (allocated by core  18)..
Allocated 1600000 cache lines...
Initializing memory...memory initialized
Start loop for latency measurement...
Each iteration took 71.1 core clocks (  27.3    ns)

[root@localhost mlc]#

Thanks!

 

 

0 Kudos
Reply