cluster fft function- a beginner question

Raemon · ‎08-24-2009

Hi, sorry for asking a newbie question...

I have no idea if I have to scatter or gather data before and after the transform.
I've tried to do that but some error messages always pop out.
I look into mkl user's guide but it seems not explain the usage of MKL_CDFT_SCATTERDATA_D and MKL_CDFT_GATHERDATA_D clear.
The following is my code. Any advice to me is helpful. Thanks a lot! (I am going to develop a much larger application. Can someone offer me an 1D example code in fortran?)

[plain]INCLUDE 'cdft_example_support.f90'
PROGRAM clusterfft_test
USE MPI
USE MKL_CDFT
REAL*8 hfr,dx
INTEGER,PARAMETER::n=2**14
REAL*8,DIMENSION(n)::x
INTEGER step,i,tmp
REAL*8::time_begin,time_end
COMPLEX(8) qq
COMPLEX*16,DIMENSION(n)::trans,WORK
COMPLEX*16,DIMENSION(n)::LOCAL

TYPE(DFTI_DESCRIPTOR_DM),POINTER::fftjob
INTEGER::STATUS,NX,NX_OUT,START_X,START_X_OUT,SIZE,ROOTRANK,ELEMENTSIZE,LENGTHS(1)
INTEGER::MPI_ERR,MPI_NPROC,MPI_RANK

!1.INITIATE MPI
CALL MPI_Init(MPI_ERR)

CALL MPI_COMM_SIZE(MPI_COMM_WORLD, MPI_NPROC, MPI_ERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, MPI_RANK, MPI_ERR)

!ELEMENTSIZE=1
!CALL MPI_BCAST(n,ELEMENTSIZE,MPI_INTEGER,0,MPI_COMM_WORLD,MPI_ERR)

hfr=50.d0
dx=(2.d0*hfr)/real(n-1)

if(MPI_RANK==0)then
write(*,*)dx,hfr,n
endif

!Creat x axis
do i=1,n
x(i)=-hfr+dx*real(i-1)
enddo
!Input
do i=1,n
trans(i)=pi**(-0.5)*dexp(-0.5d0*(x(i)**2))
enddo

!Perform the transformation

LENGTHS(1)=n
!2.ALLOCATE MEMORY 
STATUS = DftiCreateDescriptorDM(MPI_COMM_WORLD,fftjob,DFTI_DOUBLE,DFTI_COMPLEX,1,n)
!3.GET VALUE
STATUS = DftiGetValueDM(fftjob,CDFT_LOCAL_SIZE,SIZE)
STATUS = DftiGetValueDM(fftjob,CDFT_LOCAL_NX,NX)
STATUS = DftiGetValueDM(fftjob,CDFT_LOCAL_X_START,START_X)
STATUS = DftiGetValueDM(fftjob,CDFT_LOCAL_OUT_NX,NX_OUT)
STATUS = DftiGetValueDM(fftjob,CDFT_LOCAL_OUT_X_START,START_X_OUT)
!ALLOCATE(LOCAL(SIZE),WORK(SIZE))

!4.SET VALUE
STATUS = DftiSetValueDM(fftjob,CDFT_WORKSPACE,WORK)
!5.INITIALIZATION
STATUS = DftiCommitDescriptorDM(fftjob)
!6.SCATTERING DATA

ROOTRANK=0
ELEMENTSIZE=sizeof(qq)
!STATUS = MKL_CDFT_SCATTERDATA_D(MPI_COMM_WORLD,ROOTRANK,ELEMENTSIZE,1,LENGTHS,trans,NX,START_X,LOCAL)
!7.COMPUTE THE TRANSFORM
STATUS = DftiComputeBackwardDM(fftjob,trans)
!8.GATHERING DATA

!STATUS = MKL_CDFT_GATHERDATA_D(MPI_COMM_WORLD,ROOTRANK,ELEMENTSIZE,1,LENGTHS,trans,NX,START_X,LOCAL)
!9.RELEASE MEMORY
STATUS = DftiFreeDescriptorDM(fftjob)

!Output
do i=1,n
write(200,*)trans(i))
enddo

!DEALLOCATE(LOCAL,WORK)
!10.FINALIZE MPI
CALL MPI_Finalize(MPI_ERR)

END
[/plain]

Vladimir_Petrov__Int · ‎08-24-2009

Hi,

Whether you need to scatter the data or not depends on where your data come from. If they are already correctly scattered across the nodes (automagically), which can happen if your code is designed CFFT-friendly, then there is no need to do any additional data movement.

Otherwise you need to do the following:
0. Find out the size of the local buffer to be used for the computation (CDFT_LOCAL_SIZE) and allocate the buffer of this size to hold the local input data (see 1).
1. Find out what portion of the global input vector is to be used locally in the current process. That involves querying for the index of the first element (i.e. CDFT_LOCAL_X_START) of the local (to this process) portion of the global input vector and the number of subsequent elements (CDFT_LOCAL_NX).
2. Find out what portion of the global output vector will be returned by the computation routine in the current process. That involves querying for the index of the first element (i.e. CDFT_LOCAL_OUT_X_START) of the local (to this process) portion of the global output vector and the number of subsequent elements (CDFT_LOCAL_OUT_NX).
3. Put your data into the buffer from step 0. This step may involve inter-process data exchange.
4. Call the computation routine.
5. Use the computed vector according to your needs. Again, this step may involve inter-process data exchange.

I hope this will help.

Best regards,
-Vladimir

Raemon · ‎08-24-2009

Hi, Vladimir , thanks to your reply.

I have some concepts from your tips now.

What I'd like to do now is

1. Giving a test 1D gaussian function array in one node (call it node1)
2. Spread that array into working arrays in all nodes (The working arrays are in the node1 and node2)
3. Perform the transformation
4. Gathering the transformed data from all working arrays.

Is that right? I mentioned that in the mkl user's guide there are some available auxiliary functions called MKL_CDFT_ScatterData and MKL_CDFT_GatherData, which I think can help me to scatter and gather working arrays.

However, by looking into the examples in the mkl directory I cannot use these functions properly. Or should I write a scattering and gathering code myself?

Regards,
raemon

Vladimir_Petrov__Int · ‎08-24-2009

Hi,

I suggest that you just compute the values of the Gaussian function locally on each node (instead of steps 1 and 2). This way you can compute FFT of a longer vector (in your case twice as long as opposed to the one in the scheme you proposed).

Best regards,
-Vladimir

Raemon · ‎08-24-2009

Hello, Vladimir :

I am a little confused. Did you mean just put the same gaussian into each node?

If I put the same gaussian function into each node (node1 and node2), will they gather the right part array they'd like to transfrom?

By the way, this is simply a test. Finally I have to merge my code to 3D and to perform a much larger job containg lots of transforms in the application. Thus, efficiency is also important.

Regards,
Raemon

Vladimir_Petrov__Int · ‎08-25-2009

Hello, Raemon,

What I meant was:
Let G, i=1,..,2*n be the values of the Gaussian vector you want to transform.
Initialize the local input vector on node1 with values G[1],..,G and the local input vector on node2 with values G[n+1],..,G[2*n].

Best regards,
-Vladimir