Hi
I have come across a bug in Intel MPI when testing in a docker container with no numa support. It appears that the case of no numa support is not being handled correctly. More details below
Thanks
Jamil
icc --version
icc (ICC) 17.0.6 20171215
gcc --version
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
uname -a
Linux centos7dev 4.9.60-linuxkit-aufs #1 SMP Mon Nov 6 16:00:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
bug.c
#include "mpi.h"
int main (int argc, char *argv[])
{
MPI_Init(&argc,&argv);
}
I_MPI_CC=gcc mpicc -g bug.c -o bug
gdb ./bug
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-16.el7_4.2.x86_64 numactl-devel-2.0.9-6.el7_2.x86_64
(gdb) bt
#0 0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
#1 0x00007ffff70acab1 in MPID_nem_impi_create_numa_nodes_map () at ../../src/mpid/ch3/src/mpid_init.c:1355
#2 0x00007ffff70ad994 in MPID_Init (argc=0x1, argv=0x7ffff72a2268, requested=-148233624, provided=0x1, has_args=0x0, has_env=0x0)
at ../../src/mpid/ch3/src/mpid_init.c:1733
#3 0x00007ffff7043ebb in MPIR_Init_thread (argc=0x1, argv=0x7ffff72a2268, required=-148233624, provided=0x1) at ../../src/mpi/init/initthread.c:717
#4 0x00007ffff70315bb in PMPI_Init (argc=0x1, argv=0x7ffff72a2268) at ../../src/mpi/init/init.c:253
#5 0x00000000004007e8 in main (argc=1, argv=0x7fffffffcd58) at bug.c:6
Link Copied
Hi Jamil,
Can you show
$ numactl -H
?
Dmitry
Hi Dimitry
In a Centos 6 Docker container
> numactl -H
available: 0 nodes ()
libnuma: Warning: Cannot parse distance information in sysfs: No such file or directory
No distance information available.
In Centos 7 the output is - numa is not supported on this system
Jamil
Hi Dimitri, Jamil,
Just ran into the same issue today. The system I compile on has no NUMA support
numactl --show physcpubind: 0 1 2 3 4 5 6 7 No NUMA support available on this system.
When running a program only containing MPI_Init, I similarly get the segfault:
[0,1] (mpigdb) run [0,1] Continuing. [0,1] [0,1] Program received signal SIGSEGV, Segmentation fault. [0] 0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok () [1] 0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok () [0,1] from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12 [0,1] (mpigdb) bt [0] #0 0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok () [1] #0 0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok () [0,1] from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12 [0] #1 0x00007f4876c7ce91 in MPID_nem_impi_create_numa_nodes_map () [1] #1 0x00007fe059a7ce91 in MPID_nem_impi_create_numa_nodes_map () [0,1] at ../../src/mpid/ch3/src/mpid_init.c:1355 [0] #2 0x00007f4876c7dd74 in MPID_Init (argc=0x1, argv=0x7f4876e2bdb4, [1] #2 0x00007fe059a7dd74 in MPID_Init (argc=0x1, argv=0x7fe059c2bdb4, [0] requested=1994571188, provided=0x1, has_args=0x0, has_env=0x2) [1] requested=1505934772, provided=0x1, has_args=0x0, has_env=0x2) [0,1] at ../../src/mpid/ch3/src/mpid_init.c:1760 [0] #3 0x00007f4876c1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7f4876e2bdb4, [1] #3 0x00007fe059a1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7fe059c2bdb4, [0] required=1994571188, provided=0x1) at ../../src/mpi/init/initthread.c:717 [1] required=1505934772, provided=0x1) at ../../src/mpi/init/initthread.c:717 [0] #4 0x00007f4876c0c07b in PMPI_Init (argc=0x1, argv=0x7f4876e2bdb4) [1] #4 0x00007fe059a0c07b in PMPI_Init (argc=0x1, argv=0x7fe059c2bdb4) [0,1] at ../../src/mpi/init/init.c:253
Is there any way to manually disable NUMA during compilation?
Kind regards,
Mick
Any updates on this? We are seeing the same issue under WSL (used for local testing).
For more complete information about compiler optimizations, see our Optimization Notice.