- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am going to report deadlock with nonblocking send/recv and waitall.
In the following code part, the model is deadlocked (infinite wait).
The number of MPI processes is not important, but about 8,000 are used, and if it is repeated 50 times, it occurs randomly about 1 or 2 times. I don't see any special hardware issues, and the runtime options like I_MPI_HYDRA_BRANCH_COUNT helped with the synchronization issue like MPI_FINALIZE, so I would like to know if there are any useful runtime options in this case.
Please let me know if you have any additional improvements for sync control in the code below.
DO i = 0,nproc-1
IF (n_recvfrom(i) > 0) THEN
CALL mpl_irecv(recv_array(1,i), n_recvfrom(i), send_type_interpolation, i, 100, GlobalComm, recv_reqs(n_recv_reqs), info)
n_recv_reqs = n_recv_reqs + 1
END IF
END DO
DO i = 0,nproc-1
IF (n_sendto(i) > 0) THEN
CALL mpl_isend(send_array(1,i), n_sendto(i), send_type_interpolation, i, 100, GlobalComm, send_reqs(n_send_reqs), info)
n_send_reqs = n_send_reqs + 1
END IF
END DO
IF (n_recv_reqs > 0) THEN
CALL mpl_waitall(n_recv_reqs, recv_reqs, recv_istat, info)
END IF
Thank you in advance
Kihang
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please let us know the OS, CPU details, and Intel oneAPI version you are using?
>>The number of MPI processes is not important, but about 8,000 are used, and if it is repeated 50 times, it occurs randomly about 1 or 2 times.
What does it mean to repeated 50 times? Could you please elaborate more on this statement?
Could you please provide us with the sample reproducer code along with the steps to reproduce the issue to investigate more on your issue?
If you have any logs please share them with us. If not, then could you please include I_MPI_DEBUG=30 and FI_LOG_LEVEL=debug in the running command. Please find the below example command:
I_MPI_DEBUG=30 FI_LOG_LEVEL=debug mpirun -n <no-of-proc> -ppn <proc-per-node> ./a.out
Also, let us know the OFI provider you are using?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Varsha,
Thank you for your responds.
>>Could you please let us know the OS, CPU details, and Intel oneAPI version you are using?
OS: CentOS linux 8.3.2011
CPU: Intel Xeon Platinum 8368Q
OneAPI: 2021.3.0
>>Could you please provide us with the sample reproducer code along with the steps to reproduce the issue to investigate more on your issue?
It seems difficult right now because that part refers to several source codes. Let me figure out that.
>>Also, let us know the OFI provider you are using?
I am using "mlx" provider.
Best Regards,
Kihang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Could you recommend any suggestion?
Or Aren't there the information about my situation enough to clarify what problem it is?
Thanks,
Kihang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>It seems difficult right now because that part refers to several source codes. Let me figure out that.
As you mentioned in the previous reply, we are waiting for the complete reproducer code. It would be a great help if you provide a complete reproducer to investigate more on your issue.
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please provide us with the reproducer code so that we can investigate more on your issue?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks & Regards,
Varsha
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page