Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

CDFT transposed data distribution

DavidBayer
Beginner
817 Views

Hi,

I am currently integrating Intel CDFT into my project and came across some problems with transposed data distribution. As the documentation says, CDFT uses slab (1D) decomposition in the slowest axis. The distribution can be obtained by calling `DftiGetValueDM(...)` with parameters `CDFT_LOCAL_XN` and `CDFT_LOCAL_X_START`. Example (forward transform):

src[loc_n0][n1][n2] -> dst[loc_n0][n1][n2],

where `loc_n0` has the size of returned `CDFT_LOCAL_NX`.

 

In case I want to use transposed data distribution, I set the `DFTI_TRANSPOSED` parameter with `DFTI_ALLOW` via `DftiSetValueDM(...)` function. If I understand the documentation correctly, the data should be distributed in second slowest axis. Example (forward transform):

src[loc_n0][n1][n2] -> dst[loc_n1][n2][n0]

where `loc_n0` has the size of returned `CDFT_LOCAL_NX`. Do I get it right? And in case I do, how can I find out the `loc_n1` parameter?

 

Thank you for your response.

 

David

0 Kudos
1 Solution
Ruqiu_C_Intel
Moderator
548 Views

Hi David,


The formula is correct. oneMKL user can use that to determine the data distribution rather than using 'CDFT_LOCAL_OUT_X_START'.

And we also glad let you know that we will enhance the working of 'CDFT_LOCAL_OUT_NX' in case of DFTI_TRANSPOSE=DFTI_ALLOW for the new product release.


Regards,

Ruqiu


View solution in original post

0 Kudos
3 Replies
Ruqiu_C_Intel
Moderator
641 Views

Hello David,

Thank you for contacting to us. We carefully and thoroughly reviewed the CDFT source code for the quesiton.

Yes, you are right. If DFTI_TRANSPOSED=DFTI_ALLOW, then you can expect src[loc_n0][n1][n2] -> dst[loc_n1][n2][n0]. Even "CDFT_LOCAL_OUT_NX" is not working, while users can query "CDFT_LOCAL_OUT_X_START" to obtain the rows shift in the output array which should find out the `loc_n1` parameter. Hopefully it helps you.


Regards,

Ruqiu


0 Kudos
DavidBayer
Beginner
608 Views

Hello Ruqiu,

 

thank you for your answer. So, I cannot use "CDFT_LOCAL_OUT_NX" to query the local part, but I may query "CDFT_LOCAL_OUT_X_START" to find out the offset for the current process and then subtract the obtained value from the nearest lesser offset?

 

I did a bit more research and tried to go through mkl cdft fftw3 wrappers and found out that the wrapper uses this formula to compute the distribution along the given axis:

localN = (N + (P - 1)) / P for first N % P processes,

localN = N / P for the rest of the processes.

where N is the axis size, P is the number of processes and localN is the local part for each process. If I am correct, this is the same distribution of work as used by multi-gpu cuFFT.  Can I take it as the right way to compute the data distribution?

 

Anyway, it would be great if the MKL documentation described the way to obtain the transposed data distribution. I think that it would be also really helpful to be able to obtain the distribution via the "CDFT_LOCAL_OUT_NX" parameter query.

 

Thank you very much!

 

Best regards

David

0 Kudos
Ruqiu_C_Intel
Moderator
549 Views

Hi David,


The formula is correct. oneMKL user can use that to determine the data distribution rather than using 'CDFT_LOCAL_OUT_X_START'.

And we also glad let you know that we will enhance the working of 'CDFT_LOCAL_OUT_NX' in case of DFTI_TRANSPOSE=DFTI_ALLOW for the new product release.


Regards,

Ruqiu


0 Kudos
Reply