Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Memory Latency Checker mlc 3.9 crashes with SIGBUS

drMikeT
New Contributor I
2,465 Views

I have been trying to use the latest mlc v3.9 on a unit with 16 NUMA domains but unfortunately it crashes soon after starting to sample the latencies from mem domain to domain:

# /proc/sys/vm/nr_hugepages = 4000 
# /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages = 4000
# /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages = 64
/tmp/root/mlc ~/cs691/performance/analysis/systems/ccNUMA/Intel64/smpperf/MLC
# ./Linux/mlc -v
Intel(R) Memory Latency Checker - v3.9
OS id: 0 Core id: 0 Socket id: 0 Hyperthread id: 0
OS id: 1 Core id: 1 Socket id: 0 Hyperthread id: 0
OS id: 2 Core id: 2 Socket id: 0 Hyperthread id: 0
OS id: 3 Core id: 3 Socket id: 0 Hyperthread id: 0
OS id: 4 Core id: 4 Socket id: 0 Hyperthread id: 0
OS id: 5 Core id: 5 Socket id: 0 Hyperthread id: 0
OS id: 6 Core id: 6 Socket id: 0 Hyperthread id: 0
OS id: 7 Core id: 7 Socket id: 0 Hyperthread id: 0
OS id: 8 Core id: 8 Socket id: 0 Hyperthread id: 0
OS id: 9 Core id: 9 Socket id: 0 Hyperthread id: 0
OS id: 10 Core id: 10 Socket id: 0 Hyperthread id: 0
OS id: 11 Core id: 11 Socket id: 0 Hyperthread id: 0
OS id: 12 Core id: 12 Socket id: 0 Hyperthread id: 0
OS id: 13 Core id: 13 Socket id: 0 Hyperthread id: 0
OS id: 14 Core id: 14 Socket id: 0 Hyperthread id: 0
OS id: 15 Core id: 15 Socket id: 0 Hyperthread id: 0
OS id: 16 Core id: 16 Socket id: 0 Hyperthread id: 0
OS id: 17 Core id: 17 Socket id: 0 Hyperthread id: 0
OS id: 18 Core id: 18 Socket id: 0 Hyperthread id: 0
OS id: 19 Core id: 19 Socket id: 0 Hyperthread id: 0
OS id: 20 Core id: 20 Socket id: 0 Hyperthread id: 0
OS id: 21 Core id: 21 Socket id: 0 Hyperthread id: 0
OS id: 22 Core id: 22 Socket id: 0 Hyperthread id: 0
OS id: 23 Core id: 23 Socket id: 0 Hyperthread id: 0
OS id: 24 Core id: 24 Socket id: 0 Hyperthread id: 0
OS id: 25 Core id: 25 Socket id: 0 Hyperthread id: 0
OS id: 26 Core id: 26 Socket id: 0 Hyperthread id: 0
OS id: 27 Core id: 27 Socket id: 0 Hyperthread id: 0
OS id: 28 Core id: 28 Socket id: 0 Hyperthread id: 0
OS id: 29 Core id: 29 Socket id: 0 Hyperthread id: 0
OS id: 30 Core id: 30 Socket id: 0 Hyperthread id: 0
OS id: 31 Core id: 31 Socket id: 0 Hyperthread id: 0
OS id: 32 Core id: 0 Socket id: 1 Hyperthread id: 0
OS id: 33 Core id: 1 Socket id: 1 Hyperthread id: 0
OS id: 34 Core id: 2 Socket id: 1 Hyperthread id: 0
OS id: 35 Core id: 3 Socket id: 1 Hyperthread id: 0
OS id: 36 Core id: 4 Socket id: 1 Hyperthread id: 0
OS id: 37 Core id: 5 Socket id: 1 Hyperthread id: 0
OS id: 38 Core id: 6 Socket id: 1 Hyperthread id: 0
OS id: 39 Core id: 7 Socket id: 1 Hyperthread id: 0
OS id: 40 Core id: 8 Socket id: 1 Hyperthread id: 0
OS id: 41 Core id: 9 Socket id: 1 Hyperthread id: 0
OS id: 42 Core id: 10 Socket id: 1 Hyperthread id: 0
OS id: 43 Core id: 11 Socket id: 1 Hyperthread id: 0
OS id: 44 Core id: 12 Socket id: 1 Hyperthread id: 0
OS id: 45 Core id: 13 Socket id: 1 Hyperthread id: 0
OS id: 46 Core id: 14 Socket id: 1 Hyperthread id: 0
OS id: 47 Core id: 15 Socket id: 1 Hyperthread id: 0
OS id: 48 Core id: 16 Socket id: 1 Hyperthread id: 0
OS id: 49 Core id: 17 Socket id: 1 Hyperthread id: 0
OS id: 50 Core id: 18 Socket id: 1 Hyperthread id: 0
OS id: 51 Core id: 19 Socket id: 1 Hyperthread id: 0
OS id: 52 Core id: 20 Socket id: 1 Hyperthread id: 0
OS id: 53 Core id: 21 Socket id: 1 Hyperthread id: 0
OS id: 54 Core id: 22 Socket id: 1 Hyperthread id: 0
OS id: 55 Core id: 23 Socket id: 1 Hyperthread id: 0
OS id: 56 Core id: 24 Socket id: 1 Hyperthread id: 0
OS id: 57 Core id: 25 Socket id: 1 Hyperthread id: 0
OS id: 58 Core id: 26 Socket id: 1 Hyperthread id: 0
OS id: 59 Core id: 27 Socket id: 1 Hyperthread id: 0
OS id: 60 Core id: 28 Socket id: 1 Hyperthread id: 0
OS id: 61 Core id: 29 Socket id: 1 Hyperthread id: 0
OS id: 62 Core id: 30 Socket id: 1 Hyperthread id: 0
OS id: 63 Core id: 31 Socket id: 1 Hyperthread id: 0
Detected 2 sockets
parse_numanodes(): num numa nodes = 16
Number of groups in cpumap file in nodeid 0=8
Number of groups in cpumap file in nodeid 1=8
Number of groups in cpumap file in nodeid 10=8
Number of groups in cpumap file in nodeid 11=8
Number of groups in cpumap file in nodeid 12=8
Number of groups in cpumap file in nodeid 13=8
Number of groups in cpumap file in nodeid 14=8
Number of groups in cpumap file in nodeid 15=8
Number of groups in cpumap file in nodeid 2=8
Number of groups in cpumap file in nodeid 3=8
Number of groups in cpumap file in nodeid 4=8
Number of groups in cpumap file in nodeid 5=8
Number of groups in cpumap file in nodeid 6=8
Number of groups in cpumap file in nodeid 7=8
Number of groups in cpumap file in nodeid 8=8
Number of groups in cpumap file in nodeid 9=8
parse_numanodes() completed - 16 nodes present
Detected 16 numa nodes
num numa nodes=16....
numa[numa_node][offset]=logical_cpu_id
numa[0][0]=0
numa[0][1]=1
numa[0][2]=2
numa[0][3]=3
numa[1][0]=4
numa[1][1]=5
numa[1][2]=6
numa[1][3]=7
numa[2][0]=8
numa[2][1]=9
numa[2][2]=10
numa[2][3]=11
numa[3][0]=12
numa[3][1]=13
numa[3][2]=14
numa[3][3]=15
numa[4][0]=16
numa[4][1]=17
numa[4][2]=18
numa[4][3]=19
numa[5][0]=20
numa[5][1]=21
numa[5][2]=22
numa[5][3]=23
numa[6][0]=24
numa[6][1]=25
numa[6][2]=26
numa[6][3]=27
numa[7][0]=28
numa[7][1]=29
numa[7][2]=30
numa[7][3]=31
numa[8][0]=32
numa[8][1]=33
numa[8][2]=34
numa[8][3]=35
numa[9][0]=36
numa[9][1]=37
numa[9][2]=38
numa[9][3]=39
numa[10][0]=40
numa[10][1]=41
numa[10][2]=42
numa[10][3]=43
numa[11][0]=44
numa[11][1]=45
numa[11][2]=46
numa[11][3]=47
numa[12][0]=48
numa[12][1]=49
numa[12][2]=50
numa[12][3]=51
numa[13][0]=52
numa[13][1]=53
numa[13][2]=54
numa[13][3]=55
numa[14][0]=56
numa[14][1]=57
numa[14][2]=58
numa[14][3]=59
numa[15][0]=60
numa[15][1]=61
numa[15][2]=62
numa[15][3]=63
1TPC num numa nodes=16....
numa[0][0]=0
numa[0][1]=1
numa[0][2]=2
numa[0][3]=3
numa[1][0]=4
numa[1][1]=5
numa[1][2]=6
numa[1][3]=7
numa[2][0]=8
numa[2][1]=9
numa[2][2]=10
numa[2][3]=11
numa[3][0]=12
numa[3][1]=13
numa[3][2]=14
numa[3][3]=15
numa[4][0]=16
numa[4][1]=17
numa[4][2]=18
numa[4][3]=19
numa[5][0]=20
numa[5][1]=21
numa[5][2]=22
numa[5][3]=23
numa[6][0]=24
numa[6][1]=25
numa[6][2]=26
numa[6][3]=27
numa[7][0]=28
numa[7][1]=29
numa[7][2]=30
numa[7][3]=31
numa[8][0]=32
numa[8][1]=33
numa[8][2]=34
numa[8][3]=35
numa[9][0]=36
numa[9][1]=37
numa[9][2]=38
numa[9][3]=39
numa[10][0]=40
numa[10][1]=41
numa[10][2]=42
numa[10][3]=43
numa[11][0]=44
numa[11][1]=45
numa[11][2]=46
numa[11][3]=47
numa[12][0]=48
numa[12][1]=49
numa[12][2]=50
numa[12][3]=51
numa[13][0]=52
numa[13][1]=53
numa[13][2]=54
numa[13][3]=55
numa[14][0]=56
numa[14][1]=57
numa[14][2]=58
numa[14][3]=59
numa[15][0]=60
numa[15][1]=61
numa[15][2]=62
numa[15][3]=63
NODE RESERVED....
numa_reserved[0][4]=0, 1
numa_reserved[1][4]=4, 5
numa_reserved[2][4]=8, 9
numa_reserved[3][4]=12, 13
numa_reserved[4][4]=16, 17
numa_reserved[5][4]=20, 21
numa_reserved[6][4]=24, 25
numa_reserved[7][4]=28, 29
numa_reserved[8][4]=32, 33
numa_reserved[9][4]=36, 37
numa_reserved[10][4]=40, 41
numa_reserved[11][4]=44, 45
numa_reserved[12][4]=48, 49
numa_reserved[13][4]=52, 53
numa_reserved[14][4]=56, 57
numa_reserved[15][4]=60, 61
OS id: 0 Core id: 0 Socket id: 0 Hyperthread id: 0 numa: 0
OS id: 1 Core id: 1 Socket id: 0 Hyperthread id: 0 numa: 0
OS id: 2 Core id: 2 Socket id: 0 Hyperthread id: 0 numa: 0
OS id: 3 Core id: 3 Socket id: 0 Hyperthread id: 0 numa: 0
OS id: 4 Core id: 4 Socket id: 0 Hyperthread id: 0 numa: 1
OS id: 5 Core id: 5 Socket id: 0 Hyperthread id: 0 numa: 1
OS id: 6 Core id: 6 Socket id: 0 Hyperthread id: 0 numa: 1
OS id: 7 Core id: 7 Socket id: 0 Hyperthread id: 0 numa: 1
OS id: 8 Core id: 8 Socket id: 0 Hyperthread id: 0 numa: 2
OS id: 9 Core id: 9 Socket id: 0 Hyperthread id: 0 numa: 2
OS id: 10 Core id: 10 Socket id: 0 Hyperthread id: 0 numa: 2
OS id: 11 Core id: 11 Socket id: 0 Hyperthread id: 0 numa: 2
OS id: 12 Core id: 12 Socket id: 0 Hyperthread id: 0 numa: 3
OS id: 13 Core id: 13 Socket id: 0 Hyperthread id: 0 numa: 3
OS id: 14 Core id: 14 Socket id: 0 Hyperthread id: 0 numa: 3
OS id: 15 Core id: 15 Socket id: 0 Hyperthread id: 0 numa: 3
OS id: 16 Core id: 16 Socket id: 0 Hyperthread id: 0 numa: 4
OS id: 17 Core id: 17 Socket id: 0 Hyperthread id: 0 numa: 4
OS id: 18 Core id: 18 Socket id: 0 Hyperthread id: 0 numa: 4
OS id: 19 Core id: 19 Socket id: 0 Hyperthread id: 0 numa: 4
OS id: 20 Core id: 20 Socket id: 0 Hyperthread id: 0 numa: 5
OS id: 21 Core id: 21 Socket id: 0 Hyperthread id: 0 numa: 5
OS id: 22 Core id: 22 Socket id: 0 Hyperthread id: 0 numa: 5
OS id: 23 Core id: 23 Socket id: 0 Hyperthread id: 0 numa: 5
OS id: 24 Core id: 24 Socket id: 0 Hyperthread id: 0 numa: 6
OS id: 25 Core id: 25 Socket id: 0 Hyperthread id: 0 numa: 6
OS id: 26 Core id: 26 Socket id: 0 Hyperthread id: 0 numa: 6
OS id: 27 Core id: 27 Socket id: 0 Hyperthread id: 0 numa: 6
OS id: 28 Core id: 28 Socket id: 0 Hyperthread id: 0 numa: 7
OS id: 29 Core id: 29 Socket id: 0 Hyperthread id: 0 numa: 7
OS id: 30 Core id: 30 Socket id: 0 Hyperthread id: 0 numa: 7
OS id: 31 Core id: 31 Socket id: 0 Hyperthread id: 0 numa: 7
OS id: 32 Core id: 0 Socket id: 1 Hyperthread id: 0 numa: 8
OS id: 33 Core id: 1 Socket id: 1 Hyperthread id: 0 numa: 8
OS id: 34 Core id: 2 Socket id: 1 Hyperthread id: 0 numa: 8
OS id: 35 Core id: 3 Socket id: 1 Hyperthread id: 0 numa: 8
OS id: 36 Core id: 4 Socket id: 1 Hyperthread id: 0 numa: 9
OS id: 37 Core id: 5 Socket id: 1 Hyperthread id: 0 numa: 9
OS id: 38 Core id: 6 Socket id: 1 Hyperthread id: 0 numa: 9
OS id: 39 Core id: 7 Socket id: 1 Hyperthread id: 0 numa: 9
OS id: 40 Core id: 8 Socket id: 1 Hyperthread id: 0 numa: 10
OS id: 41 Core id: 9 Socket id: 1 Hyperthread id: 0 numa: 10
OS id: 42 Core id: 10 Socket id: 1 Hyperthread id: 0 numa: 10
OS id: 43 Core id: 11 Socket id: 1 Hyperthread id: 0 numa: 10
OS id: 44 Core id: 12 Socket id: 1 Hyperthread id: 0 numa: 11
OS id: 45 Core id: 13 Socket id: 1 Hyperthread id: 0 numa: 11
OS id: 46 Core id: 14 Socket id: 1 Hyperthread id: 0 numa: 11
OS id: 47 Core id: 15 Socket id: 1 Hyperthread id: 0 numa: 11
OS id: 48 Core id: 16 Socket id: 1 Hyperthread id: 0 numa: 12
OS id: 49 Core id: 17 Socket id: 1 Hyperthread id: 0 numa: 12
OS id: 50 Core id: 18 Socket id: 1 Hyperthread id: 0 numa: 12
OS id: 51 Core id: 19 Socket id: 1 Hyperthread id: 0 numa: 12
OS id: 52 Core id: 20 Socket id: 1 Hyperthread id: 0 numa: 13
OS id: 53 Core id: 21 Socket id: 1 Hyperthread id: 0 numa: 13
OS id: 54 Core id: 22 Socket id: 1 Hyperthread id: 0 numa: 13
OS id: 55 Core id: 23 Socket id: 1 Hyperthread id: 0 numa: 13
OS id: 56 Core id: 24 Socket id: 1 Hyperthread id: 0 numa: 14
OS id: 57 Core id: 25 Socket id: 1 Hyperthread id: 0 numa: 14
OS id: 58 Core id: 26 Socket id: 1 Hyperthread id: 0 numa: 14
OS id: 59 Core id: 27 Socket id: 1 Hyperthread id: 0 numa: 14
OS id: 60 Core id: 28 Socket id: 1 Hyperthread id: 0 numa: 15
OS id: 61 Core id: 29 Socket id: 1 Hyperthread id: 0 numa: 15
OS id: 62 Core id: 30 Socket id: 1 Hyperthread id: 0 numa: 15
OS id: 63 Core id: 31 Socket id: 1 Hyperthread id: 0 numa: 15
Test running on 2499.93 MHZ processor(s)

Using buffer size of 600.000MiB
Core 32 is running a busy loop to keep socket 1 from low frequency states
Core 0 is running a busy loop to keep socket 0 from low frequency states
Measuring idle latencies (in ns)...

Numa node 0 (core 1) measuring latency to memory on numa node 0 ..
Free memory on node 0 = 28439524
AllocAndInitMemoryForLatencyThread: buf_len 629145600,NUMA_NODE 2 nodeId 0
GrabMemoryFromNumaNode(): Allocate 631242752 bytes from numa node0
Allocated 4915200 cache lines...
./MLC: line 75: 259154 Bus error (core dumped) ./Linux/mlc

 

$ numactl -H
available: 16 nodes (0-15)
node 0 cpus: 0 1 2 3
node 0 size: 31870 MB
node 0 free: 27774 MB
node 1 cpus: 4 5 6 7
node 1 size: 32252 MB
node 1 free: 29175 MB
node 2 cpus: 8 9 10 11
node 2 size: 32253 MB
node 2 free: 24189 MB
node 3 cpus: 12 13 14 15
node 3 size: 32252 MB
node 3 free: 30816 MB
node 4 cpus: 16 17 18 19
node 4 size: 32253 MB
node 4 free: 24258 MB
node 5 cpus: 20 21 22 23
node 5 size: 32252 MB
node 5 free: 30487 MB
node 6 cpus: 24 25 26 27
node 6 size: 32237 MB
node 6 free: 25173 MB
node 7 cpus: 28 29 30 31
node 7 size: 32240 MB
node 7 free: 29880 MB
node 8 cpus: 32 33 34 35
node 8 size: 32253 MB
node 8 free: 23845 MB
node 9 cpus: 36 37 38 39
node 9 size: 32252 MB
node 9 free: 23924 MB
node 10 cpus: 40 41 42 43
node 10 size: 32253 MB
node 10 free: 25374 MB
node 11 cpus: 44 45 46 47
node 11 size: 32252 MB
node 11 free: 31179 MB
node 12 cpus: 48 49 50 51
node 12 size: 32253 MB
node 12 free: 23326 MB
node 13 cpus: 52 53 54 55
node 13 size: 32252 MB
node 13 free: 27264 MB
node 14 cpus: 56 57 58 59
node 14 size: 32253 MB
node 14 free: 22291 MB
node 15 cpus: 60 61 62 63
node 15 size: 32252 MB
node 15 free: 31307 MB
node distances:
node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0: 10 11 12 12 12 12 12 12 32 32 32 32 32 32 32 32
1: 11 10 12 12 12 12 12 12 32 32 32 32 32 32 32 32
2: 12 12 10 11 12 12 12 12 32 32 32 32 32 32 32 32
3: 12 12 11 10 12 12 12 12 32 32 32 32 32 32 32 32
4: 12 12 12 12 10 11 12 12 32 32 32 32 32 32 32 32
5: 12 12 12 12 11 10 12 12 32 32 32 32 32 32 32 32
6: 12 12 12 12 12 12 10 11 32 32 32 32 32 32 32 32
7: 12 12 12 12 12 12 11 10 32 32 32 32 32 32 32 32
8: 32 32 32 32 32 32 32 32 10 11 12 12 12 12 12 12
9: 32 32 32 32 32 32 32 32 11 10 12 12 12 12 12 12
10: 32 32 32 32 32 32 32 32 12 12 10 11 12 12 12 12
11: 32 32 32 32 32 32 32 32 12 12 11 10 12 12 12 12
12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 12 12
13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 12 12
14: 32 32 32 32 32 32 32 32 12 12 12 12 12 12 10 11
15: 32 32 32 32 32 32 32 32 12 12 12 12 12 12 11 10

 

 

 

 Any suggestions?

 

Thank you!

Michael

0 Kudos
5 Replies
drMikeT
New Contributor I
2,458 Views

Update :  failing option is "--latency_matrix

 

# ./Linux/mlc --latency_matrix 
Intel(R) Memory Latency Checker - v3.9
Command line parameters: --latency_matrix

Using buffer size of 600.000MiB
Measuring idle latencies (in ns)...
Numa node
Numa node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 ./MLC: line 75: 13831 Bus error (core dumped) ./Linux/mlc
0 Kudos
Bernard
Valued Contributor I
2,436 Views

You have coredump collected so gdb can be used to reveal more in-depth information.

SIGBUS (Signal 7) rarely is caused by the hardware issue, and in your case it might be triggered by referencing the uninitialized portion of memory mapped file.

0 Kudos
drMikeT
New Contributor I
2,423 Views

SIGBUS is usually caused by accessing improperly aligned data items. 

Our system does not actually generate core dumps so I cannot examine a core file

 

regards

Michael

0 Kudos
Bernard
Valued Contributor I
2,413 Views

This line shows the core dump (or I'm missing something)

Bus error (core dumped) ./Linux/mlc

  The EFLAGS register has its bit 15 or 18 (can not remember exactly) to denote the alignment check. It would be interesting to see the EFLAGS content preserved during the core dump

0 Kudos
drMikeT
New Contributor I
2,405 Views

Our OS is instructed not to leave core dumps... so I cannot debug a core dump.

I was hoping that Intel would check the MLC src code and realize that for an insane number of NUMA domains (eg 16 +) there is a bug.

 

thanks

Michael

0 Kudos
Reply