Community
cancel
Showing results for 
Search instead for 
Did you mean: 
YU__Jihong
Beginner
76 Views

Deadlock Problem when using the Cluster FFT

Hello,

I have run a massive simulation using MPI on distributed memory supercomputers
(FUJITSU Server PRIMERGY CX2550 M4 × 880)
and compiled with the intel/2018.2.046 Fortran compiler.

I have deadlock problems when using the Cluster FFT and the Available Auxiliary Functions
(MKL_CDFT_ScatterData and MKL_CDFT_GatherData) and the performance of the simulation is too slow.

The simulation is for solving Navier–Stokes equations and
3D(X, Y, and Z) arrays necessary to solve the equations.
Since in the simulation boundary conditions of the Y and Z directions are periodic,
I applied 2D Cluster FFT in the two directions and iterated the calculation along the other direction X as below.

==============================================
STATUS = DftiCreateDescriptorDM(MKL_COMM,DESC,DFTI_DOUBLE,DFTI_COMPLEX,2,LENGTHS)

STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
STATUS = DftiSetValueDM(DESC,DFTI_PLACEMENT,DFTI_NOT_INPLACE)

DO I = 1, Nx-1

 ALLOCATE(X_IN(M,N))
 
 DO K = 1, N
  DO J = 1, M
  X_IN(J,K) = DCMPLX(A(I,J,K),0d0)
  END DO
 END DO

 STATUS = DftiCommitDescriptorDM(DESC)
 STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL) 
 STATUS = DftiComputeForwardDM(DESC,LOCAL,WORK)
 STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)

 DO K = 1, N
  DO J = 1, M
  T_F1(I,J,K) = X_IN(J,K)
  END DO
 END DO
 
 DEALLOCATE(X_IN)

END DO

DEALLOCATE(LOCAL, WORK)

~~~~~~~~~~~~~~~~~~~
<SOME CALCULATIONS>
~~~~~~~~~~~~~~~~~~~

STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X_OUT)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NXX)
STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X)
ALLOCATE(LOCAL(SIZE), WORK(SIZE), STAT=STATUS)
SCALE = 1.0_8/(N*M)
STATUS = DftiSetValueDM(DESC,DFTI_BACKWARD_SCALE,SCALE)

DO I = 1, Nx-1

 ALLOCATE(X_IN(M,N))
 
 DO K = 1, N
  DO J = 1, M
  X_IN(J,K) = A(I,J,K)
  END DO
 END DO
 
 STATUS = DftiCommitDescriptorDM(DESC)
 STATUS = MKL_CDFT_SCATTERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,WORK)
 STATUS = DftiComputeBackwardDM(DESC,WORK,LOCAL)
 STATUS = MKL_CDFT_GATHERDATA_D(COMM,ROOTRANK,ELEMENTSIZE,2,LENGTHS,X_IN,NXX,START_X,LOCAL)
 
 DO K = 1, N
  DO J = 1, M
  P(I,J,K) = REAL(X_IN(J,K))
  END DO
 END DO 
 
 DEALLOCATE(X_IN)

END DO

DEALLOCATE(LOCAL, WORK)

STATUS = DftiFreeDescriptorDM(DESC)
==============================================


I programed this simulation based on the 'cdft_example_support' and 'dm_complex_2d_double_ex2' provided by the Intel MKL.
After using -check_mpi, I've got the following errors when calculating the first Do loop.


==============================================
[0] ERROR: no progress observed in any process for over 11:12 minutes, aborting application
[0] WARNING: starting premature shutdown

[0] ERROR: GLOBAL:DEADLOCK:HARD: fatal error
[0] ERROR:    Application aborted because no progress was observed for over 11:12 minutes,
[0] ERROR:    check for real deadlock (cycle of processes waiting for data) or
[0] ERROR:    potential deadlock (processes sending data to each other and getting blocked
[0] ERROR:    because the MPI might wait for the corresponding receive).
[0] ERROR:    [0] no progress observed for over 11:12 minutes, process is currently in MPI call:
[0] ERROR:       mpi_gather_(*sendbuf=0x762610, sendcount=2, sendtype=MPI_INTEGER, *recvbuf=0x2b9c9acc4b80, recvcount=2, recvtype=MPI_INTEGER, root=0, comm=MPI_COMM_WORLD, *ierr=0x7fffca56ca50)
[0] ERROR:       module_mpi_mp_mkl_cdft_scatterdata_d_ (/home/~)
[0] ERROR:       press_ffttdma_ (/home/~)
[0] ERROR:       rk3_uvwpc_ (/home/~)
[0] ERROR:       MAIN__ (/home/~)
[0] ERROR:       main (/home/~)
[0] ERROR:       __libc_start_main (/usr/lib64/libc-2.17.so)
[0] ERROR:       (/home/~)
.
.
.
[0] INFO: GLOBAL:DEADLOCK:HARD: found 1 time (1 error + 0 warnings), 0 reports were suppressed
[0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed.
==============================================

I have tried to solve this deadlock and being slow problems for several weeks but I can't fix it.
I would greatly appreciate any help or some insight on this problems.

Best regards

YU,

0 Kudos
3 Replies
Pamela_H_Intel
Moderator
76 Views

Jihang,

I will look into this.

Pamela

YU__Jihong
Beginner
76 Views

Dear Pamela

Thank you, I desperately need your help.
Now I figure out
If the Nx (in the part of DO I = 1, Nx-1) is less than 600, the deadlock problem does not happen.
But I need Nx more than 900.
I have just started learning MPI and using library,  and can't find ways of solving problems.
I would appreciate greatly it if you could give me some suggestions.

Best regards

YU,

Pamela_H_Intel
Moderator
76 Views

Yu, I need you to scale down the code. For example create only 2 MPI processes See if the problem persists. Also, I wonder how long was running? I see "No progress observed for 11:12 min". If you don't already, capture start and end time . . . if you are using a version of Linux, you can just wrap your call in the time call (for example: time ls; time sleep 2; time ). This may help us discover if there is a mistake in your code (it didn't do anything but wait) or things were working until a data value went bad. I look forward to hearing what you find. Pamela
Reply