- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am testing my pure-MPI Fortran code which performs calculations on partitioned mesh-blocks for every iteration of the simulation. Therefore, rank-boundary data are transferred via Async send/recv with neighboring ranks. This can be seen below.
reqs = MPI_REQUEST_NULL
ir = 0
is = 0
! ****************************************
LOOP_RECV: do i=1,N
nei => neighbours(i)
nei_recv => neighbours_recv(i)
tg = 1
ir = ir + 1
call MPI_IRECV(nei_recv%data(1), order*ndim*nei%number_of_elements, &
MPI_DOUBLE_PRECISION, nei%dest, tg, MPI_COMM_WORLD, reqr(ir), ierr)
enddo LOOP_RECV
! ****************************************
! ****************************************
LOOP_SEND: do i=1,N
nei => neighbours(i)
nei_send => neighbours_send(i)
tg = 1
is = is + 1
call MPI_ISEND(nei_send%data(1), order*ndim*nei%number_of_elements, &
MPI_DOUBLE_PRECISION, nei%dest, tg, MPI_COMM_WORLD, reqs(is), ierr)
enddo LOOP_SEND
! ****************************************
call mpi_waitall(ir,reqr,statusr,ierr)
When I increase the size of the send/recv array (the order is increased and thus I have to send double or triple sized arrays), a deadlock occurs in a random iteration far from the start of the simulation (for order=1 the simulation finishes successfully!).
This occurs both on my workstation (Intel i5-10400, 6 processes) and on the cluster ( 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz with 20 cores each) for number of processes ranging from 40-200.
In one of the cluster's simulations, the following message was printed while deadlocked.
dapl async_event CQ (0x22f6d70) ERR 0
dapl_evd_cq_async_error_callback (0x22c0d90, 0x22f6ed0, 0x7ff2de9f2bf0, 0x22f6d70)
dapl async_event QP (0x2275f00) Event 1
(On my PC I have intel mpiifort 2021.11.1 and on the cluster version 19.1.0)
Does anyone know what goes wrong and how to fix it?
Spiros
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have found the solution to my problem.
For anyone interested, I did not call the waitall subroutine for the send request (I was only calling it for the recv request), and this subroutine deallocates the requests and sets the corresponding handles to MPI_REQUEST_NULL.
This created a memory leak, which, after many iterations, resulted in some type of deadlock for the MPI, which I cannot interpret.
If anyone can give an insight into why the deadlock happened instead of a crash due to excessive memory per rank (which is predefined in the batch script), it would be much appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Got it. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Spiros sorry, that is a very old version of Intel MPI, can you please try the latest version available?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried the mpiifx compiler of 2024 overnight with 6 ranks, and unfortunately, after several hours a deadlock occurs.
However, I noticed from a monitor of processes of my system that, every process accumulates memory gradually (starts from 167 mb per process and when the deadlock occurs, every process has a memory usage of 1.5-3 gb).
Also, send/recv buffers are allocated and deallocated before and after the data exchange.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have found the solution to my problem.
For anyone interested, I did not call the waitall subroutine for the send request (I was only calling it for the recv request), and this subroutine deallocates the requests and sets the corresponding handles to MPI_REQUEST_NULL.
This created a memory leak, which, after many iterations, resulted in some type of deadlock for the MPI, which I cannot interpret.
If anyone can give an insight into why the deadlock happened instead of a crash due to excessive memory per rank (which is predefined in the batch script), it would be much appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Spiros glad that you found your problem, for such a simple code, I highly doubt that the error is related to the MPI implementation.
You can still use -check_mpi to check for such errors.
Another note, for performance reasons you don't want to deallocate/allocate the send/recv buffer for every iterations, keeping them alive will enable performance benefits.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page