I am facing an issue with an MPI program hanging when using Intel MPI.
Characteristics of the system:
- Intel MPI version: 2018 Update 4 Build 20180823
- Network type: Mellanox InfiniBand HDR100
- Network topology: Dragonfly
- CPU: AMD Epyc 7742
When I use only 2 nodes (256 processes), the code works fine. But, when I use 8 nodes, the behaviour is random i.e. most of the time it hangs, but sometimes it gives segfault error.
The stack trace at the time of hanging shows that the processes are stuck at:
dapl_rc_vc_progress_short_msg_20() at ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:483
However, if I enable UD transport via export I_MPI_DAPL_UD=on, it works fine. With UD, the code works even on 10k procs.
My question is: how to know what causes RC (RDMA) to hang (or segfault) the computation? And, how can I fix it?
I would prefer to take advantage of RC up to at least 8 nodes, and then for larger runs, I can switch to UD (if needed) to save memory.
Please note that I do not face this problem with Open MPI or MVAPICH2.
Thanks in advance.
The Intel MPI version you were using was old and unsupported now. For the list of supported versions refer Intel® Parallel Studio XE & Intel® oneAPI Toolkits...
Since IMPI 2019 the Intel® MPI Library switched from the Open Fabrics Alliance* (OFA) framework to the Open Fabrics Interfaces* (OFI) framework.
Can you upgrade to the latest version? There have been many bug fixes and performance improvements since the 2018 version.
Unfortunately, I am not the administrator of the machine. So, I do not have control over it. I can try to install the newer version of Intel MPI in my home directory, but it will not be a practical solution as other MPI implementations already work with RC.
I wanted to know if there are some (hidden) RC-RDMA related environment variables that can help in fixing this issue.
Anyway, I will ask the system admin to know why they recommend using only UD with Intel MPI as they must have faced the same problem.
The selection logic for UD or RC depends on the no. of ranks, no. of nodes and the fabric provider being used. Generally, for small-scale IMPI selects RC and for large scale runs it selects UD.
You can refer to this article (Tuning the Intel® MPI Library: Advanced Techniques) where it has been explained why UD is selected for large scale runs and how to further tune DAPl for large scale runs.
Let us know if you found this helpful.
If you still want to use RC for larger runs let me know.
We are closing this thread assuming your issue has been resolved. We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only