- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I use Intel MPI update 7 in a Slurm configuration on two cores on two separate nodes, I get a SIGFPE here (according to gdb on the generated core file):
#0 0x00000000004436ed in ipl_create_domains (pi=0x0, scale=4786482) at ../../../../../src/pm/i_hydra/../../intel/ipl/include/../src/ipl_service.c:2240
This happens only with mpirun / mpiexec.hydra using e.g. "mpirun -n 2 ./test"
I know of 3 workarounds, any of which will let me run this successfully, but I thought maybe you or others should know about this crash:
1. Set I_MPI_PMI_LIBRARY=libpmi2.so and use "srun -n 2 ./test" (with Slurm configured to use pmi2).
2. Use I_MPI_HYDRA_TOPOLIB=ipl
3. Use the "legacy" mpiexec.hydra.
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some more details:
OS: CentOS 7.7, Linux blg8616.int.ets1.calculquebec.ca 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz GenuineIntel GNU/Linux
If I run (from "mpiicc ${I_MPI_ROOT}/test/test.c -g -o test") with "I_MPI_HYDRA_TOPOLIB=ipl" I get this:
[oldeman@blg8616 test]$ I_MPI_DEBUG=5 mpirun -n 2 ./test
[0] MPI startup(): libfabric version: 1.10.0a1-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 173711 blg8616.int.ets1.calculquebec.ca 37
[0] MPI startup(): 1 220728 blg8621.int.ets1.calculquebec.ca 29
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_FC=ifort
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/cvmfs/soft.computecanada.ca/easybuild/software/2019/avx2/Compiler/intel2020/intelmpi/2019.7.217
[0] MPI startup(): I_MPI_LINK=opt
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=ipl
[0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=slurm
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=5
Hello world: rank 0 of 2 running on blg8616.int.ets1.calculquebec.ca
Hello world: rank 1 of 2 running on blg8621.int.ets1.calculquebec.ca
but without that set, it gives me (and also with just "hostname"):
[oldeman@blg8616 test]$ I_MPI_DEBUG=5 mpirun -n 2 ./test
srun: error: blg8621: task 1: Floating point exception (core dumped)
srun: error: blg8616: task 0: Floating point exception (core dumped)
[mpiexec@blg8616.int.ets1.calculquebec.ca] wait_proxies_to_terminate (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:528): downstream from host blg8616 exited with status 136
[mpiexec@blg8616.int.ets1.calculquebec.ca] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:2114): assert (exitcodes != NULL) failed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bart,
Thanks for reaching out to us.
We will investigate this issue further and will get back to you soon.
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bart,
Intel MPI Library 2019 U8 has just been released. Could you please rerun your experiments with mpiexec.hydra and report your findings, please?
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bart,
Having not received your response for over a month, I am going ahead and closing this thread. Whenever, Intel MPI Library's native process manager is not used, we recommend to set the PMI library explicitly, using the I_MPI_PMI_LIBRARY environment variable.
For more details, please refer to the following links -
[2] https://slurm.schedmd.com/mpi_guide.html
This issue will be treated as resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Best regards,
Amar

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page