- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have run a massive simulation using MPI on distributed memory supercomputers
(FUJITSU Server PRIMERGY CX2550 M4 × 880)
and compiled with the intel/2018.2.046 Fortran compiler.
I have deadlock problems when using the Cluster FFT and the Available Auxiliary Functions
(MKL_CDFT_ScatterData and MKL_CDFT_GatherData) and the performance of the simulation is too slow.
The simulation is for solving Navier–Stokes equations and
3D(X, Y, and Z) arrays necessary to solve the equations.
Since in the simulation boundary conditions of the Y and Z directions are periodic,
I applied 2D Cluster FFT in the two directions and iterated the calculation along the other direction X as below.
==============================================
STATUS = DftiCreateDescriptorDM(MKL_COMM,DESC,DFTI_DOUBLE,DFTI_COMPLEX,2,LENGTHS)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
STATUS = DftiSetValueDM(DESC,DFTI_PLACEMENT,DFTI_NOT_INPLACE)
DO I = 1, Nx-1
ALLOCATE(X_IN(M,N))
DO K = 1, N
DO J = 1, M
X_IN(J,K) = DCMPLX(A(I,J,K),0d0)
END DO
END DO
STATUS = DftiCommitDescriptorDM(DESC)
STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL)
STATUS = DftiComputeForwardDM(DESC,LOCAL,WORK)
STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)
DO K = 1, N
DO J = 1, M
T_F1(I,J,K) = X_IN(J,K)
END DO
END DO
DEALLOCATE(X_IN)
END DO
DEALLOCATE(LOCAL, WORK)
~~~~~~~~~~~~~~~~~~~
<SOME CALCULATIONS>
~~~~~~~~~~~~~~~~~~~
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
SCALE = 1.0_8/(N*M)
STATUS = DftiSetValueDM(DESC,DFTI_BACKWARD_SCALE,SCALE)
DO I = 1, Nx-1
ALLOCATE(X_IN(M,N))
DO K = 1, N
DO J = 1, M
X_IN(J,K) = A(I,J,K)
END DO
END DO
STATUS = DftiCommitDescriptorDM(DESC)
STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)
STATUS = DftiComputeBackwardDM(DESC,WORK,LOCAL)
STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL)
DO K = 1, N
DO J = 1, M
P(I,J,K) = REAL(X_IN(J,K))
END DO
END DO
DEALLOCATE(X_IN)
END DO
DEALLOCATE(LOCAL, WORK)
STATUS = DftiFreeDescriptorDM(DESC)
==============================================
I programed this simulation based on the 'cdft_example_support' and 'dm_complex_2d_double_ex2' provided by the Intel MKL.
After using -check_mpi, I've got the following errors when calculating the first Do loop.
==============================================
[0] ERROR: no progress observed in any process for over 11:12 minutes, aborting application
[0] WARNING: starting premature shutdown
[0] ERROR: GLOBAL:DEADLOCK:HARD: fatal error
[0] ERROR: Application aborted because no progress was observed for over 11:12 minutes,
[0] ERROR: check for real deadlock (cycle of processes waiting for data) or
[0] ERROR: potential deadlock (processes sending data to each other and getting blocked
[0] ERROR: because the MPI might wait for the corresponding receive).
[0] ERROR: [0] no progress observed for over 11:12 minutes, process is currently in MPI call:
[0] ERROR: mpi_gather_(*sendbuf=0x762610, sendcount=2, sendtype=MPI_INTEGER, *recvbuf=0x2b9c9acc4b80, recvcount=2, recvtype=MPI_INTEGER, root=0, comm=MPI_COMM_WORLD, *ierr=0x7fffca56ca50)
[0] ERROR: module_mpi_mp_mkl_cdft_scatterdata_d_ (/home/~)
[0] ERROR: press_ffttdma_ (/home/~)
[0] ERROR: rk3_uvwpc_ (/home/~)
[0] ERROR: MAIN__ (/home/~)
[0] ERROR: main (/home/~)
[0] ERROR: __libc_start_main (/usr/lib64/libc-2.17.so)
[0] ERROR: (/home/~)
.
.
.
[0] INFO: GLOBAL:DEADLOCK:HARD: found 1 time (1 error + 0 warnings), 0 reports were suppressed
[0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed.
==============================================
I have tried to solve this deadlock and being slow problems for several weeks but I can't fix it.
I would greatly appreciate any help or some insight on this problems.
Best regards
YU,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jihang,
I will look into this.
Pamela
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Pamela
Thank you, I desperately need your help.
Now I figure out
If the Nx (in the part of DO I = 1, Nx-1) is less than 600, the deadlock problem does not happen.
But I need Nx more than 900.
I have just started learning MPI and using library, and can't find ways of solving problems.
I would appreciate greatly it if you could give me some suggestions.
Best regards
YU,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page