Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Calling local sizes of 3D MPI FFT plans

GTA
Beginner
764 Views

Hello All,

I am moving a legacy code from Linux to Windows that uses FFTW 2.1.5 and so I have created and successfully linked to MKL's FFTW wrappers.  My question however is about some of the wrappers functionality with respect to a 3 dimensional FFT, specifically the wrapper function fftwnd_mpi_local_sizes().  Show below is the original FFTW output and the MKL wrapper output.

fftwnd_mpi_local_sizes(fftwnd_mpi_plan p,
int *local_nx -> int *CDFT_LOCAL_NX,
int *local_x_start -> int *CDFT_LOCAL_X_START,
int *local_ny_after_transpose -> int *CDFT_LOCAL_OUT_NX,
int *local_y_start_after_transpose -> int *CDFT_LOCAL_OUT_X_START
int *total_local_size -> int *CDFT_LOCAL_SIZE)

Local_ny_after_transpose and local_y_start_after_transpose are not being set to the information that is expected in the original FFTW implementation. Our layout and data allocation for the mpi processes heavily rely on the original output.  After looking over the MKL documentation it appears that this is all MKL's FFT can give, unfortunately the Y values are critical.

An example of the problem is if I have a 36 by 16 by 14 X,Y,Z transform over 2 processors, FFTW output is expected to be processor_1(plan,18,0,8,0,4032) processor_2(plan,18,18,8,8,4032) but MKL will output processor_1(plan,18,0,18,0,4032) processor_2(plan,18,18,18,18,4032). This example may be predictable but the sizes of X,Y,Z are arbitrary and so is the number of processors so it no longer becomes very predictable.  Are there any solutions to this problem?

-Thank you all,

0 Kudos
6 Replies
Chao_Y_Intel
Moderator
764 Views
Hi, Attached is the wrapper files that fixed this problem. We will also include it in the future release. Thanks, Chao
0 Kudos
GTA
Beginner
764 Views
Thank you for your help, however the output still seems to be improper. I deleted the old wrapper library to make sure I was not linking to it and recreated it using your fix but it still did not work. To further test I decided to write my own test using just the mkl functions without wrappers, here is the relevant parts of the code: LENGTHS(1) = 25 LENGTHS(2) = 15 LENGTHS(3) = 5 PRINT*,"LENGTHS", LENGTHS STATUS = DftiCreateDescriptorDM(FFT_COMM,DESC,DFTI_DOUBLE, DFTI_COMPLEX,3,LENGTHS) STATUS = DftiCommitDescriptorDM(DESC) !***RETRIEVE VALUES*** STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT) !PRINT RETRIEVED VALUES PRINT*, "DFTI values after transpose:START_X,NX,START_X_OUT,NX_OUT,SIZE" PRINT *,START_X,NX,START_X_OUT,NX_OUT,SIZE !***NOW MAKE TRANSPOSED AND REPEAT*** STATUS=DftiSetValueDM(DESC,DFTI_TRANSPOSE,DFTI_ALLOW) STATUS=DftiCommitDescriptorDM(DESC) !RETRIEVE VALUES STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_SIZE,SIZE) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_NX,NX) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_X_START,START_X) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_NX,NX_OUT) STATUS = DftiGetValueDM(DESC,CDFT_LOCAL_OUT_X_START,START_X_OUT) !PRINT RETRIEVED VALUES PRINT*, "DFTI values:START_X,NX,START_X_OUT,NX_OUT,SIZE" PRINT *,START_X,NX,START_X_OUT,NX_OUT,SIZE The output I get is DFTI values:START_X,NX,START_X_OUT,NX_OUT,SIZE 1 5 1 5 1875 DFTI values after transpose:START_X,NX,START_X_OUT,NX_OUT,SIZE 1 5 1 5 1875 As you can see from the output the transpose is not being applied, curiously when the transform is computed I do get transformed data (tested on a square matrix). Is this a bug or am I doing something wrong? I am using mkl 11.0.1.119, ifort.exe 13.0.1.119 Build 20121008, and icl.exe 13.0.1.119 Build 20121008 if that helps. -Thanks
0 Kudos
Chao_Y_Intel
Moderator
764 Views
Hello, Thanks for your report. We could verify this is bug for the function, and we plan to fix it in the future releas. Thanks, Chao
0 Kudos
GTA
Beginner
764 Views
Okay, thank you for confirming my suspicion. I look forward to the next release.
0 Kudos
GTA
Beginner
764 Views
One last comment related to this problem. When I do an FFT set up as such: call fftwnd_f77_create_plan(fft_fwdplan, 3, fft_size, FFTW_FORWARD, FFTW_ESTIMATE+FFTW_IN_PLACE) call fftwnd_f77_mpi(frw_plan, n_fields, mat(:, k), work, 1, FFTW_TRANSPOSED_ORDER) The output found in mat(:,k) is jumbled in a weird way. I think this has to do with whether or not MKL is outputting the data in a row major(c++) or column major way(FORTRAN). I've attached example files where the "correct way" for a FORTRAN programs output by FFTW is in FFTW_output.txt and the MKL output is in MKL_output.txt for an fft where fft_size = (18,14,12). The jumble seems to be FFTW( i ) = MKL( nx*z + x + y*nx*nz ) where x=0:nx-1, y=0:ny-1, z=0:nz-1, i=0:nx*ny*nz-1. Please correct the output mix up for FORTRAN programs as you address the issue with local_size. This way my program will not have a time delay as it "un-jumbles" the data Thank you, Gabe
0 Kudos
GTA
Beginner
764 Views
The files
0 Kudos
Reply