Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Deadlock Problem when using the Cluster FFT

YU__Jihong
Beginner
235 Views

Hello,

I have run a massive simulation using MPI on distributed memory supercomputers
(FUJITSU Server PRIMERGY CX2550 M4 × 880)
and compiled with the intel/2018.2.046 Fortran compiler.

I have deadlock problems when using the Cluster FFT and the Available Auxiliary Functions
(MKL_CDFT_ScatterData and MKL_CDFT_GatherData) and the performance of the simulation is too slow.

The simulation is for solving Navier–Stokes equations and
3D(X, Y, and Z) arrays necessary to solve the equations.
Since in the simulation boundary conditions of the Y and Z directions are periodic,
I applied 2D Cluster FFT in the two directions and iterated the calculation along the other direction X as below.

==============================================
STATUS = DftiCreateDescriptorDM(MKL_COMM,DESC,DFTI_DOUBLE,DFTI_COMPLEX,2,LENGTHS)

STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
STATUS = DftiSetValueDM(DESC,DFTI_PLACEMENT,DFTI_NOT_INPLACE)

DO I = 1, Nx-1

 ALLOCATE(X_IN(M,N))
 
 DO K = 1, N
  DO J = 1, M
  X_IN(J,K) = DCMPLX(A(I,J,K),0d0)
  END DO
 END DO

 STATUS = DftiCommitDescriptorDM(DESC)
 STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL) 
 STATUS = DftiComputeForwardDM(DESC,LOCAL,WORK)
 STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)

 DO K = 1, N
  DO J = 1, M
  T_F1(I,J,K) = X_IN(J,K)
  END DO
 END DO
 
 DEALLOCATE(X_IN)

END DO

DEALLOCATE(LOCAL, WORK)

~~~~~~~~~~~~~~~~~~~
<SOME CALCULATIONS>
~~~~~~~~~~~~~~~~~~~

STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
SCALE = 1.0_8/(N*M)
STATUS = DftiSetValueDM(DESC,DFTI_BACKWARD_SCALE,SCALE)

DO I = 1, Nx-1

 ALLOCATE(X_IN(M,N))
 
 DO K = 1, N
  DO J = 1, M
  X_IN(J,K) = A(I,J,K)
  END DO
 END DO
 
 STATUS = DftiCommitDescriptorDM(DESC)
 STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)
 STATUS = DftiComputeBackwardDM(DESC,WORK,LOCAL)
 STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL)
 
 DO K = 1, N
  DO J = 1, M
  P(I,J,K) = REAL(X_IN(J,K))
  END DO
 END DO 
 
 DEALLOCATE(X_IN)

END DO

DEALLOCATE(LOCAL, WORK)

STATUS = DftiFreeDescriptorDM(DESC)
==============================================


I programed this simulation based on the 'cdft_example_support' and 'dm_complex_2d_double_ex2' provided by the Intel MKL.
After using -check_mpi, I've got the following errors when calculating the first Do loop.


==============================================
[0] ERROR: no progress observed in any process for over 11:12 minutes, aborting application
[0] WARNING: starting premature shutdown

[0] ERROR: GLOBAL:DEADLOCK:HARD: fatal error
[0] ERROR:    Application aborted because no progress was observed for over 11:12 minutes,
[0] ERROR:    check for real deadlock (cycle of processes waiting for data) or
[0] ERROR:    potential deadlock (processes sending data to each other and getting blocked
[0] ERROR:    because the MPI might wait for the corresponding receive).
[0] ERROR:    [0] no progress observed for over 11:12 minutes, process is currently in MPI call:
[0] ERROR:       mpi_gather_(*sendbuf=0x762610, sendcount=2, sendtype=MPI_INTEGER, *recvbuf=0x2b9c9acc4b80, recvcount=2, recvtype=MPI_INTEGER, root=0, comm=MPI_COMM_WORLD, *ierr=0x7fffca56ca50)
[0] ERROR:       module_mpi_mp_mkl_cdft_scatterdata_d_ (/home/~)
[0] ERROR:       press_ffttdma_ (/home/~)
[0] ERROR:       rk3_uvwpc_ (/home/~)
[0] ERROR:       MAIN__ (/home/~)
[0] ERROR:       main (/home/~)
[0] ERROR:       __libc_start_main (/usr/lib64/libc-2.17.so)
[0] ERROR:       (/home/~)
.
.
.
[0] INFO: GLOBAL:DEADLOCK:HARD: found 1 time (1 error + 0 warnings), 0 reports were suppressed
[0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed.
==============================================

I have tried to solve this deadlock and being slow problems for several weeks but I can't fix it.
I would greatly appreciate any help or some insight on this problems.

Best regards

YU,

0 Kudos
3 Replies
Pamela_H_Intel
Moderator
235 Views

Jihang,

I will look into this.

Pamela

0 Kudos
YU__Jihong
Beginner
235 Views

Dear Pamela

Thank you, I desperately need your help.
Now I figure out
If the Nx (in the part of DO I = 1, Nx-1) is less than 600, the deadlock problem does not happen.
But I need Nx more than 900.
I have just started learning MPI and using library,  and can't find ways of solving problems.
I would appreciate greatly it if you could give me some suggestions.

Best regards

YU,

0 Kudos
Pamela_H_Intel
Moderator
235 Views
Yu, I need you to scale down the code. For example create only 2 MPI processes See if the problem persists. Also, I wonder how long was running? I see "No progress observed for 11:12 min". If you don't already, capture start and end time . . . if you are using a version of Linux, you can just wrap your call in the time call (for example: time ls; time sleep 2; time ). This may help us discover if there is a mistake in your code (it didn't do anything but wait) or things were working until a data value went bad. I look forward to hearing what you find. Pamela
0 Kudos
Reply