Hello!
I'm trying to use one sided communications for load balancing with MPI.
The algorithm steals some jobs from other MPI threads. For this, it makes MPI_Win_lock, MPI_get, some computing, MPI_put, MPI_Win_unlock. For the rank, which owns memory, it works fine, but call MPI_Put in other ranks leads to
[vn01:3300889:0:3300889] ib_mlx5_log.c:177 Remote access on mlx5_bond_0:1/RoCE (synd 0x13 vend 0x88 hw_synd 0/0)
[vn01:3300889:0:3300889] ib_mlx5_log.c:177 RC QP 0x2f537 wqe[3]: RDMA_WRITE --- [rva 0x7fd2d7be90a8 rkey 0x2e76c7] [inl len 16] [rqpn 0x2f541 dlid=0 sl=0 port=1 src_path_bits=0 dgid=::ffff:10.152.0.10 sgid_index=3 traffic_class=0]
With a nice backtrace:
Image PC Routine Line Source
libpthread-2.28.s 00007FD0A06EECF0 Unknown Unknown Unknown
libc-2.28.so 00007FD09FBD7ACF gsignal Unknown Unknown
libc-2.28.so 00007FD09FBAAEA5 abort Unknown Unknown
libucs.so.0.0.0 00007FD09C6922E6 Unknown Unknown Unknown
libucs.so.0.0.0 00007FD09C6974F4 ucs_log_default_h Unknown Unknown
libucs.so.0.0.0 00007FD09C697814 ucs_log_dispatch Unknown Unknown
libuct_ib.so.0.0. 00007FD09BD314FA uct_ib_mlx5_compl Unknown Unknown
libuct_ib.so.0.0. 00007FD09BD483A0 Unknown Unknown Unknown
libuct_ib.so.0.0. 00007FD09BD32F9D uct_ib_mlx5_check Unknown Unknown
libuct_ib.so.0.0. 00007FD09BD463AA Unknown Unknown Unknown
libucp.so.0.0.0 00007FD09CC5282A ucp_worker_progre Unknown Unknown
libucp.so.0.0.0 00007FD09CC6A318 ucp_worker_flush Unknown Unknown
libmlx-fi.so 00007FD09CEDD50D Unknown Unknown Unknown
libmpi.so.12.0.0 00007FD0A0F5205F Unknown Unknown Unknown
libmpi.so.12.0.0 00007FD0A0F64E46 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FD0A0F43593 PMPI_Win_unlock Unknown Unknown
libmpifort.so.12. 00007FD0AAC4A01D mpi_win_unlock__ Unknown Unknown
a.out 0000000000405B9B Unknown Unknown Unknown
a.out 0000000000405DCC Unknown Unknown Unknown
a.out 00000000004052AD Unknown Unknown Unknown
libc-2.28.so 00007FD09FBC3D85 __libc_start_main Unknown Unknown
a.out 00000000004051CE Unknown Unknown Unknown
The same code works fine with OpenMPI, and also, replacing of MPI_Put to MPI_Accumulate is also works fine. You can try to uncomment lines 170 and 216 with removing of MPI_Put calls.
In the attachment you will find not minimal, but relatively clear example which leads to failing. It will fail for 2-5 MPI ranks because of task scheduling.
compilation:
mpiifx put.f90 -cpp
running:
mpirun -n 4 ./a.out
Used MPI version 2021.13 and IFX 2024.2.1 (from the latest HPC toolkit)
Igor
链接已复制
Hi,
This problem is not reproduced in our internal system.
Why don't you refer to the below URL and configure 'I_MPI_PMI_LIBRARY' environment?
Thanks.
Hi,
Here are some steps to troubleshoot and resolve this issue:
- Check OFI Providers: Ensure that the necessary OFI providers are installed on your system. You can check the available providers by running:
$ fi_info
This command should list the available fabric interfaces. If it returns "No data available," it means no suitable providers are found.
- Configure Intel MPI to Use a Specific Provider: Sometimes, specifying a particular provider can help. You can set the FI_PROVIDER environment variable to a specific provider that is available on your system. For example:
$ export FI_PROVIDER=sockets
You can add this line to your Slurm job script before the mpirun or srun command.
- Check Network Configuration: Ensure that the network interfaces on your nodes are properly configured and accessible. The OFI provider might be looking for specific high-performance network interfaces (like InfiniBand or Omni-Path) that are not configured or available.
- Intel MPI Configuration: Intel MPI can be configured to use different communication fabrics. You can try setting the I_MPI_FABRICS environment variable to use a different fabric. For example:
$export I_MPI_FABRICS=shm:ofi or export I_MPI_FABRICS=shm:tcp
Add this line to your Slurm job script before the mpirun or srun command.
You can get hint at here ( https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-13/ofi-capable-network-fabrics-control.html )
Thanks
Hi!
Shortly, the problem occurs only with `I_MPI_FABRICS=ofi`. Looks like it is a problem with choosing some default provider, when it is not available.
On my machine, I have the following fi providers:
$ fi_info | grep provider | sort -u
provider: mlx
provider: psm3
provider: shm
provider: tcp
provider: tcp;ofi_rxm
provider: verbs
provider: verbs;ofi_rxm
I checked all of them, and all of them works fine
Then, I just set `I_MPI_FABRICS=54543`, and I saw the following message:
MPI startup(): 54543 fabric is unknown or has been removed from the product, please use ofi or shm:ofi instead.
As you can see, I do not have `ofi` (and `shm:ofi`) provider, but MPI suggests to use it.
So, I set `I_MPI_FABRICS=ofi` and then I saw my error with RDMA_Write. At the same time, `I_MPI_FABRICS=shm:ofi` works fine
Igor
