- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am currently integrating Intel CDFT into my project and came across some problems with transposed data distribution. As the documentation says, CDFT uses slab (1D) decomposition in the slowest axis. The distribution can be obtained by calling `DftiGetValueDM(...)` with parameters `CDFT_LOCAL_XN` and `CDFT_LOCAL_X_START`. Example (forward transform):
src[loc_n0][n1][n2] -> dst[loc_n0][n1][n2],
where `loc_n0` has the size of returned `CDFT_LOCAL_NX`.
In case I want to use transposed data distribution, I set the `DFTI_TRANSPOSED` parameter with `DFTI_ALLOW` via `DftiSetValueDM(...)` function. If I understand the documentation correctly, the data should be distributed in second slowest axis. Example (forward transform):
src[loc_n0][n1][n2] -> dst[loc_n1][n2][n0]
where `loc_n0` has the size of returned `CDFT_LOCAL_NX`. Do I get it right? And in case I do, how can I find out the `loc_n1` parameter?
Thank you for your response.
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David,
The formula is correct. oneMKL user can use that to determine the data distribution rather than using 'CDFT_LOCAL_OUT_X_START'.
And we also glad let you know that we will enhance the working of 'CDFT_LOCAL_OUT_NX' in case of DFTI_TRANSPOSE=DFTI_ALLOW for the new product release.
Regards,
Ruqiu
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello David,
Thank you for contacting to us. We carefully and thoroughly reviewed the CDFT source code for the quesiton.
Yes, you are right. If DFTI_TRANSPOSED=DFTI_ALLOW, then you can expect src[loc_n0][n1][n2] -> dst[loc_n1][n2][n0]. Even "CDFT_LOCAL_OUT_NX" is not working, while users can query "CDFT_LOCAL_OUT_X_START" to obtain the rows shift in the output array which should find out the `loc_n1` parameter. Hopefully it helps you.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ruqiu,
thank you for your answer. So, I cannot use "CDFT_LOCAL_OUT_NX" to query the local part, but I may query "CDFT_LOCAL_OUT_X_START" to find out the offset for the current process and then subtract the obtained value from the nearest lesser offset?
I did a bit more research and tried to go through mkl cdft fftw3 wrappers and found out that the wrapper uses this formula to compute the distribution along the given axis:
localN = (N + (P - 1)) / P for first N % P processes,
localN = N / P for the rest of the processes.
where N is the axis size, P is the number of processes and localN is the local part for each process. If I am correct, this is the same distribution of work as used by multi-gpu cuFFT. Can I take it as the right way to compute the data distribution?
Anyway, it would be great if the MKL documentation described the way to obtain the transposed data distribution. I think that it would be also really helpful to be able to obtain the distribution via the "CDFT_LOCAL_OUT_NX" parameter query.
Thank you very much!
Best regards
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David,
The formula is correct. oneMKL user can use that to determine the data distribution rather than using 'CDFT_LOCAL_OUT_X_START'.
And we also glad let you know that we will enhance the working of 'CDFT_LOCAL_OUT_NX' in case of DFTI_TRANSPOSE=DFTI_ALLOW for the new product release.
Regards,
Ruqiu

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page