Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Jamil_A_
Beginner
444 Views

Intel MPI segmentation fault bug

 Hi

   I have come across a bug in Intel MPI when testing in a docker container with no numa support. It appears that the case of no numa support is not being handled correctly.  More details below

 Thanks

  Jamil

    icc --version
    icc (ICC) 17.0.6 20171215

gcc --version
gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)

     uname -a
     Linux centos7dev 4.9.60-linuxkit-aufs #1 SMP Mon Nov 6 16:00:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux 

     bug.c

 #include "mpi.h"

int main (int argc, char *argv[])
{
   MPI_Init(&argc,&argv);
}

I_MPI_CC=gcc mpicc -g bug.c -o bug

gdb ./bug

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-16.el7_4.2.x86_64 numactl-devel-2.0.9-6.el7_2.x86_64
(gdb) bt
#0  0x00007ffff7b64f45 in __I_MPI___intel_sse2_strtok () from /opt/intel/compilers_and_libraries_2017.6.256/linux/mpi/intel64/lib/libmpifort.so.12
#1  0x00007ffff70acab1 in MPID_nem_impi_create_numa_nodes_map () at ../../src/mpid/ch3/src/mpid_init.c:1355
#2  0x00007ffff70ad994 in MPID_Init (argc=0x1, argv=0x7ffff72a2268, requested=-148233624, provided=0x1, has_args=0x0, has_env=0x0)
    at ../../src/mpid/ch3/src/mpid_init.c:1733
#3  0x00007ffff7043ebb in MPIR_Init_thread (argc=0x1, argv=0x7ffff72a2268, required=-148233624, provided=0x1) at ../../src/mpi/init/initthread.c:717
#4  0x00007ffff70315bb in PMPI_Init (argc=0x1, argv=0x7ffff72a2268) at ../../src/mpi/init/init.c:253
#5  0x00000000004007e8 in main (argc=1, argv=0x7fffffffcd58) at bug.c:6

0 Kudos
5 Replies
Dmitry_S_Intel
Employee
444 Views

Hi Jamil,

Can you show

$ numactl -H

?

Dmitry 

Jamil_A_
Beginner
444 Views

 

 Hi Dimitry

 In a Centos 6 Docker container

  > numactl -H

available: 0 nodes ()
libnuma: Warning: Cannot parse distance information in sysfs: No such file or directory
No distance information available.

 In Centos 7 the output is -  numa is not supported on this system

 Jamil

  

van_Duijn__Mick
Beginner
444 Views

Hi Dimitri, Jamil,

Just ran into the same issue today. The system I compile on has no NUMA support

numactl --show
physcpubind: 0 1 2 3 4 5 6 7
No NUMA support available on this system.

When running a program only containing MPI_Init, I similarly get the segfault:

[0,1] (mpigdb) run
[0,1]   Continuing.
[0,1]
[0,1]   Program received signal SIGSEGV, Segmentation fault.
[0]     0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok ()
[1]     0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok ()
[0,1]      from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12
[0,1] (mpigdb) bt
[0]     #0  0x00007f4876dc0805 in __I_MPI___intel_sse2_strtok ()
[1]     #0  0x00007fe059bc0805 in __I_MPI___intel_sse2_strtok ()
[0,1]      from /opt/intel/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so.12
[0]     #1  0x00007f4876c7ce91 in MPID_nem_impi_create_numa_nodes_map ()
[1]     #1  0x00007fe059a7ce91 in MPID_nem_impi_create_numa_nodes_map ()
[0,1]       at ../../src/mpid/ch3/src/mpid_init.c:1355
[0]     #2  0x00007f4876c7dd74 in MPID_Init (argc=0x1, argv=0x7f4876e2bdb4,
[1]     #2  0x00007fe059a7dd74 in MPID_Init (argc=0x1, argv=0x7fe059c2bdb4,
[0]         requested=1994571188, provided=0x1, has_args=0x0, has_env=0x2)
[1]         requested=1505934772, provided=0x1, has_args=0x0, has_env=0x2)
[0,1]       at ../../src/mpid/ch3/src/mpid_init.c:1760
[0]     #3  0x00007f4876c1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7f4876e2bdb4,
[1]     #3  0x00007fe059a1eaeb in MPIR_Init_thread (argc=0x1, argv=0x7fe059c2bdb4,
[0]         required=1994571188, provided=0x1) at ../../src/mpi/init/initthread.c:717
[1]         required=1505934772, provided=0x1) at ../../src/mpi/init/initthread.c:717
[0]     #4  0x00007f4876c0c07b in PMPI_Init (argc=0x1, argv=0x7f4876e2bdb4)
[1]     #4  0x00007fe059a0c07b in PMPI_Init (argc=0x1, argv=0x7fe059c2bdb4)
[0,1]       at ../../src/mpi/init/init.c:253

Is there any way to manually disable NUMA during compilation?

Kind regards,
Mick

Ben_Held
Beginner
444 Views

Any updates on this? We are seeing the same issue under WSL (used for local testing).

Andriy
Beginner
444 Views

Same issue here under WSL Ubuntu 18.04 and Intel MPI Library 2018 update 4. forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source prog.exe 00000000032799D4 for__signal_handl Unknown Unknown libpthread-2.27.s 00007FD178B92890 Unknown Unknown Unknown prog.exe 000000000325A805 __I_MPI___intel_s Unknown Unknown prog.exe 0000000003041B9C Unknown Unknown Unknown prog.exe 0000000003042A74 Unknown Unknown Unknown prog.exe 0000000003010E23 Unknown Unknown Unknown prog.exe 000000000300EAB6 PMPI_Init_thread Unknown Unknown prog.exe 0000000002F3A0AE PMPI_INIT_THREAD Unknown Unknown prog.exe 0000000000AAF137 MAIN__ Unknown Unknown prog.exe 0000000000966F2E main Unknown Unknown libc-2.27.so 00007FD178401B97 __libc_start_main Unknown Unknown prog.exe 0000000000966E2A _start Unknown Unknown The issue must be with hydra or something related to MPI initialization on WSL. Because the same exe runs just fine on regular Linux machine. When linked against Intel MPI 2019 Update 1, program runs fine under the same WSL. But startup time is very long.