topic Re: running on MPI cluster in Intel® MPI Library

running on MPI cluster

Alexandre_fr — Mon, 19 Dec 2022 16:30:13 GMT

Hello every one.

I installed a linux cluster (Ubuntu 20.04.2 LTS) following this web note .
I finally get a working set-up with gfortran and ~~openmpi~~ mpich.
The program test :

program hello_mpi USE MPI_f08 implicit none integer num_procs, namelen,id character *(MPI_MAX_PROCESSOR_NAME) procs_name call MPI_INIT () call MPI_COMM_RANK (MPI_COMM_WORLD, id) call MPI_COMM_SIZE (MPI_COMM_WORLD, num_procs) call MPI_GET_PROCESSOR_NAME (procs_name, namelen) write(*,'(A24,I2,A4,I2,A14,A15)') "Hello world from process", id, " of ", num_procs, & " processes on ", procs_name call MPI_FINALIZE () end program

!compilation command :
mpifort -I /usr/lib/x86_64-linux-gnu/mpich/include/ MPI_test.f90
! launch command :
mpirun -np 20 -hosts master,slave1 ./a.out

.....

Hello world from process 2 of 20 processes on master
Hello world from process 1 of 20 processes on master
Hello world from process18 of 20 processes on slave1
Hello world from process19 of 20 processes on slave1

........

Now I go toward intel compiler with oneapi. I installed the toolkit on the four nodes. I followed the install process given by Intel and I add the command in the .bashrc of all nodes :
source /opt/intel/oneapi/setvars.sh

With the same program :

!compilation command :
! mpiifort MPI_test.f90
! launch command :
! mpirun -np 20 -hosts master,slave1 ./a.out

and I got :

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread-2.31.s 00007F50EDB343C0 Unknown Unknown Unknown
librxm-fi.so 00007F5021F1A856 Unknown Unknown Unknown
....
librxm-fi.so 00007F5021F1D9C7 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F50EE196EAE Unknown Unknown Unknown
....
libmpi.so.12.0.0 00007F50EDFFCD1B MPI_Init Unknown Unknown
libmpifort.so.12. 00007F50EF53C816 mpi_init_f08_ Unknown Unknown
a.out 00000000004041EB Unknown Unknown Unknown
a.out 000000000040419D Unknown Unknown Unknown
libc-2.31.so 00007F50ED8050B3 __libc_start_main Unknown Unknown
a.out 00000000004040BE Unknown Unknown Unknown

=================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 17 PID 18095 RUNNING AT slave1
= KILLED BY SIGNAL: 9 (Killed)
=================================================================

Note that the program is OK when it remains on the same node.
I was wondering if the source /opt/intel/oneapi/setvars.sh was launched while you use MPI ? Since ssh is used without bash. If so, how can we solve that ?
Maybe the issue is something else.

Some help would be appreciated.
Thank you,
Alexandre

Re: running on MPI cluster

Alexandre_fr — Sun, 18 Dec 2022 21:33:49 GMT

Additional tests that lost me more than before...

This following one is OK.

mpiexec -n 2 -ppn 1 -hosts master,slave1 ./a.out
Hello world from process 0 of 2 processes on master
Hello world from process 1 of 2 processes on slave1

Those following ones are not OK.
mpirun -n 4 -ppn 1 -hosts master,slave1,slave2,slave3 ./a.out
Hello world from process 0 of 4 processes on master
Hello world from process 2 of 4 processes on slave2
Hello world from process 3 of 4 processes on slave3
Hello world from process 1 of 4 processes on slave1
Abort(810114063) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
PMPI_Finalize(220)...............: MPI_Finalize failed
PMPI_Finalize(164)...............:
MPID_Finalize(1716)..............:
MPIDI_OFI_mpi_finalize_hook(1760):
MPIR_Reduce_intra_binomial(149)..:
MPIC_Send(129)...................:
MPID_Send(888)...................:
MPIDI_send_unsafe(203)...........:
MPIDI_OFI_send_normal(252).......:
MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)

mpiexec -n 2 -ppn 1 -hosts master,slave2 ./a.out
Hello world from process 0 of 2 processes on master
Hello world from process 1 of 2 processes on slave2
Abort(810114063) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
......
MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)

mpiexec -n 2 -ppn 1 -hosts master,slave3 ./a.out
Hello world from process 1 of 2 processes on slave3
Abort(810114063) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:
...........
MPIDI_OFI_send_handler_vni(496)..: OFI tagged send failed (ofi_impl.h:496:MPIDI_OFI_send_handler_vni:Network is unreachable)

This invalidates the possible setvars.sh problem, since it can reach and finish some test.
It say the Network is unreachable. But the ssh connections was done with ssh-keygen to make a link :
master <---> slave1
master <---> slave2

master <---> slave3

Ok, now I am lost with no idea about what is happening...

Re: running on MPI cluster

Arjen_Markus — Mon, 19 Dec 2022 08:06:28 GMT

I am not all that familiar with the mechanics of MPI, but if I understand it correctly, from conversations with colleagues, MPI "environments" are specific to the compiler that you used. Could it be that the mpirun/mpiexec commands or the MPI background process are the GCC versions and not the Intel versions? That might explain the dramatic failure you observe.

Re: running on MPI cluster

Alexandre_fr — Mon, 19 Dec 2022 08:41:31 GMT

Thank you for your reply.
According to tests, the wrapper mpirun/mpiexec does not affect the results.
If the environment, was problematic, how could I get a successful intel test but only with 2 processes on master/slave? On the top, the test with one process on each node is ok, until finalisation.

If I go in verbose mode with the test I see first that the environment looks ok (?).

mpiifort MPI_test.f90
mpirun -np 20 -v -hosts master,slave1 ./a.out
[mpiexec@master] Launch arguments: /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_bstrap_proxy --upstream-host master --upstream-port 33591 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/oneapi/mpi/2021.8.0//bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[mpiexec@master] Launch arguments: /usr/bin/ssh -q -x slave1 /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_bstrap_proxy --upstream-host master --upstream-port 33591 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel/oneapi/mpi/2021.8.0//bin/ --tree-width 16 --tree-level 1 --time-left -1 --launch-type 2 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /opt/intel/oneapi/mpi/2021.8.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9

Moreover, I was wondering if the MPI_finalize() contains a Barrier in the mpich (I made a mistake on the first message: I made a set-up gcc-mpich) but not in the intel-openmpi. Adding a barrier after the prompt and before the finalization does not help but reveals a failure in the only working test :

mpiifort MPI_test.f90
mpiexec -n 2 -ppn 1 -hosts master,slave1 ./a.out
Hello world from process 1 of 2 processes on slave1
Hello world from process 0 of 2 processes on master
... And the program is waiting forever.

I put here the verbose of the test that crash directly (with node slave2) :

mpiexec -n 2 -ppn 1 -v -hosts master,slave2 ./a.out

... environment stuff .....
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:1@slave2] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_maxes
[proxy:0:1@slave2] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_appnum
[proxy:0:1@slave2] PMI response: cmd=appnum appnum=0
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get_my_kvsname
[proxy:0:1@slave2] PMI response: cmd=my_kvsname kvsname=kvs_16360_0
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=PMI_process_mapping
[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0@master] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@master] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@master] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@master] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@master] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@master] PMI response: cmd=appnum appnum=0
[proxy:0:0@master] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@master] PMI response: cmd=my_kvsname kvsname=kvs_16360_0
[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=PMI_process_mapping
[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=barrier_in
[proxy:0:0@master] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@master] PMI response: cmd=barrier_out
[proxy:0:1@slave2] PMI response: cmd=barrier_out
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=put kvsname=kvs_16360_0 key=bc-1 value=mpi#0200A7DB0A2A003E0000000000000000$
[proxy:0:1@slave2] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=barrier_in
[proxy:0:0@master] pmi cmd from fd 6: cmd=put kvsname=kvs_16360_0 key=bc-0 value=mpi#02008F8B0AB80E970000000000000000$
[proxy:0:0@master] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@master] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@master] PMI response: cmd=barrier_out
[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=bc-0
[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=mpi#02008F8B0AB80E970000000000000000$
[proxy:0:0@master] pmi cmd from fd 6: cmd=get kvsname=kvs_16360_0 key=bc-1
[proxy:0:0@master] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200A7DB0A2A003E0000000000000000$
[proxy:0:1@slave2] PMI response: cmd=barrier_out
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=bc-0
[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=mpi#02008F8B0AB80E970000000000000000$
[proxy:0:1@slave2] pmi cmd from fd 4: cmd=get kvsname=kvs_16360_0 key=bc-1
[proxy:0:1@slave2] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200A7DB0A2A003E0000000000000000$
Hello world from process 1 of 2 processes on slave2
Hello world from process 0 of 2 processes on master
forrtl: severe (174): SIGSEGV, segmentation fault occurred
... memory location stuff ....

It seems the communication looks ok but not for the end of the run.
What could it be?

Re: running on MPI cluster

jimdempseyatthecove — Mon, 19 Dec 2022 14:49:08 GMT

Alexandre,

There was a post on this forum with a similar issue (which I am unable to locate). The issue involved an incompatibility amongst fabric selection(s). I think Ron Green provided the answer.

You might want to experiment with fabric selections starting with the older generation methods.

Jim Dempsey

Re: running on MPI cluster

Barbara_P_Intel — Mon, 19 Dec 2022 16:59:27 GMT

Moved this MPI question over to the oneAPI HPC Toolkit Forum. That's the best source for MPI information.

Re: running on MPI cluster

Alexandre_fr — Mon, 19 Dec 2022 16:56:58 GMT

The good point is that I am learning a lot. The bad point is that it doesn't seem to change anything.

First, the debug option 3 and higher do not work: the code is freezing.
Level 2 provides the fabric information.

mpiexec -n 2 -ppn 1 -hosts master,slave3 -env I_MPI_DEBUG=2 ./a.out
[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"
Hello world from process 0 of 2 processes on master
Hello world from process 1 of 2 processes on slave3
forrtl: severe (174): SIGSEGV, segmentation fault occurred

But I tried to change the fabric environment, but it doesn't help at all (I tried available options, according to intel: shm, shm:ofi, ofi)

mpiexec -n 2 -ppn 1 -hosts master,slave3 -env I_MPI_DEBUG=2 -env I_MPI_FABRICS=shm:ofi ./a.out

[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"

Maybe the red lines are a problem?

Re: running on MPI cluster

jimdempseyatthecove — Mon, 19 Dec 2022 18:40:10 GMT

Is "/opt/intel/oneapi/mpi/2021.8.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat" located on all systems in your cluster?

Jim Dempsey

Re: running on MPI cluster

Alexandre_fr — Tue, 20 Dec 2022 00:48:18 GMT

This file tuning_skx_shm-ofi_tcp-ofi-rxm_1.dat is absent on all system.
But its friend tuning_skx_shm-ofi_tcp-ofi-rxm.dat is present on all.
The api was installed on the 4 nodes. The master'api from the online installer, the 3 slaves from the off line one. Maybe I should install the very same toolkit. I will try. But there is worst...

During tests, I perform this run :
mpifort MPI_test.f90
mpirun -n 20 -hosts master,slave2 ./a.out

And the installation I though good (gfortran + mpich) have a trouble too. It appears when I put the call of the barrier
(call MPI_BARRIER(MPI_COMM_WORLD)) after the hello world message.

the hello world looks ok

Hello world from process 1 of 20 processes on master
Hello world from process 4 of 20 processes on master
........
Hello world from process18 of 20 processes on slave2
Hello world from process16 of 20 processes on slave2

But the barrier fails...

Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(289).....................: MPI_Barrier(comm=MPI_COMM_WORLD) failed
PMPI_Barrier(275).....................:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_smp(43)............:
MPIR_Barrier_impl(175)................:
MPIR_Barrier_intra_auto(110)..........:
MPIR_Barrier_intra_dissemination(49)..:
MPIDU_Complete_posted_with_error(1137): Process failed
MPIR_Barrier_intra_smp(59)............:
MPIR_Bcast_impl(310)..................:
MPIR_Bcast_intra_auto(223)............:
MPIR_Bcast_intra_binomial(182)........: Failure during collective
Fatal error in PMPI_Barrier: Unknown error class, error stack:

.....

With the barrier call, and within the same node, the program is ok.
Maybe the intel and openmpi are more sensitive than gfortran and mpich, which is why the problem rise with the intel configuration. FYI, the firewalls were put down. I am not owning the router that connects the four nodes. I may suspect the router now. Is it possible?