- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have come across a bug in Intel MPI when testing in a docker container with no numa support. It appears that the case of no numa support is not being handled correctly. More details below
Thanks
Jamil
icc --version
icc (ICC) 17.0.6 20171215
gcc --version
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
uname -a
Linux centos7dev 4.9.60-linuxkit-aufs #1 SMP Mon Nov 6 16:00:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
bug.c
#include "mpi.h"
int main (int argc, char *argv[])
{
MPI_Init(&argc,&argv);
}
I_MPI_CC=gcc mpicc -g bug.c -o bug
gdb ./bug
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-16.el7_4.2.x86_64 numactl-devel-2.0.9-6.el7_2.x86_64
(gdb) bt
#0 0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
#1 0x00007ffff70acab1 in MPID_nem_impi_create_numa_nodes_map () at ../../src/mpid/ch3/src/mpid_init.c:1355
#2 0x00007ffff70ad994 in MPID_Init (argc=0x1, argv=0x7ffff72a2268, requested=-148233624, provided=0x1, has_args=0x0, has_env=0x0)
at ../../src/mpid/ch3/src/mpid_init.c:1733
#3 0x00007ffff7043ebb in MPIR_Init_thread (argc=0x1, argv=0x7ffff72a2268, required=-148233624, provided=0x1) at ../../src/mpi/init/initthread.c:717
#4 0x00007ffff70315bb in PMPI_Init (argc=0x1, argv=0x7ffff72a2268) at ../../src/mpi/init/init.c:253
#5 0x00000000004007e8 in main (argc=1, argv=0x7fffffffcd58) at bug.c:6
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jamil,
Can you show
$ numactl -H
?
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dimitry
In a Centos 6 Docker container
> numactl -H
available: 0 nodes ()
libnuma: Warning: Cannot parse distance information in sysfs: No such file or directory
No distance information available.
In Centos 7 the output is - numa is not supported on this system
Jamil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dimitri, Jamil,
Just ran into the same issue today. The system I compile on has no NUMA support
numactl --show physcpubind: 0 1 2 3 4 5 6 7 No NUMA support available on this system.
When running a program only containing MPI_Init, I similarly get the segfault:
[0,1] (mpigdb) run [0,1] Continuing. [0,1] [0,1] Program received signal SIGSEGV, Segmentation fault. [0] 0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok () [1] 0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok () [0,1] from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12 [0,1] (mpigdb) bt [0] #0 0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok () [1] #0 0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok () [0,1] from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12 [0] #1 0x00007f4876c7ce91 in MPID_nem_impi_create_numa_nodes_map () [1] #1 0x00007fe059a7ce91 in MPID_nem_impi_create_numa_nodes_map () [0,1] at ../../src/mpid/ch3/src/mpid_init.c:1355 [0] #2 0x00007f4876c7dd74 in MPID_Init (argc=0x1, argv=0x7f4876e2bdb4, [1] #2 0x00007fe059a7dd74 in MPID_Init (argc=0x1, argv=0x7fe059c2bdb4, [0] requested=1994571188, provided=0x1, has_args=0x0, has_env=0x2) [1] requested=1505934772, provided=0x1, has_args=0x0, has_env=0x2) [0,1] at ../../src/mpid/ch3/src/mpid_init.c:1760 [0] #3 0x00007f4876c1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7f4876e2bdb4, [1] #3 0x00007fe059a1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7fe059c2bdb4, [0] required=1994571188, provided=0x1) at ../../src/mpi/init/initthread.c:717 [1] required=1505934772, provided=0x1) at ../../src/mpi/init/initthread.c:717 [0] #4 0x00007f4876c0c07b in PMPI_Init (argc=0x1, argv=0x7f4876e2bdb4) [1] #4 0x00007fe059a0c07b in PMPI_Init (argc=0x1, argv=0x7fe059c2bdb4) [0,1] at ../../src/mpi/init/init.c:253
Is there any way to manually disable NUMA during compilation?
Kind regards,
Mick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any updates on this? We are seeing the same issue under WSL (used for local testing).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page