Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

DftiComputeForward and DftiComputeBAckward are thread safe?

unrue
Beginner
717 Views
Dearl mkl users,
i'm using MKL 10 and INtel 11.1 under Linux, in particulary FFT calls. My goal is to use OpenMP + MKL to compute FFT. I'm in a second case of this guide:
UsingDftiComputeForward andDftiComputeBackward in a parallel region, give me wrong results. If these function are wrapped into critical region, works well.
So my question is:DftiComputeForward andDftiComputeBackward are thread safe?
Thanks a lot.
0 Kudos
7 Replies
Dmitry_B_Intel
Employee
717 Views
Hi

That your test gives wrong results may be caused by incorrect use of MKL. Sharing the test would help to reproduce and identify the problem. For instance, thesecond case in the link you've given implicitly assumes 4 threads in the parallel region - have you checked this?

The compute functions are thread safe assuming not more than N threads are using the same descriptor, where N is the value set by DftiSetValue(hand,DFTI_NUMBER_OF_USER_THREADS, N).

Thanks
Dima
0 Kudos
unrue
Beginner
717 Views
Hi Dimitry,
now works well. Results are good setting one FFT descriptor per thread, but the performances are poor :(
My MPI test case has 32 nodes with 8 Nehalem cores. Hybrid case has 32 nodes , one MPI process per node and 8 thread per process. Hybrid case is two time slow.
I tested all affinity cases with KML_AFFINITY variable, but the results are the same. So i think is not an affinity problem.
Do you have any idea about these performances?
Thanks so much.
0 Kudos
Dmitry_B_Intel
Employee
717 Views

If the transforms are large then with 8 tranforms per node you may be short on RAM, because descriptor may require some memory too. What are your transform sizes and RAM/node?
Thanks
Dima

0 Kudos
unrue
Beginner
717 Views
Ram in a node is about 24 Gb. What is "tranform size"? The size of FFT loops?
If yes, my FFT loops has about 196 iterations (in this case), so about 24 iteration per thread, who is the same amount of work per MPI process in pure MPI code.
Something like this:
[fortran]$OMP DO  i=1,196
..some code...
FFTForward
... some code.
FFTBAckward.
$OMP END DO[/fortran]
My suspect is that each iteration is too big for a single thread, but as mentioned, is the same work of a MPI process..
0 Kudos
Vladimir_Petrov__Int
New Contributor III
717 Views
Hi,

What particular MPI are you using?

Best regards,
-Vladimir
0 Kudos
unrue
Beginner
717 Views
Hi,
i'm using OpenMPI 1.3.3.
0 Kudos
Vladimir_Petrov__Int
New Contributor III
717 Views

Hi,

I did not encounter such problems with OpenMPI.

In order to localize the issue I suggest that you comment your code in the loop and leave only calls to MKL.
That would help to make the first step towards the problem solution - whether the slowness is caused by MKL or your code.

Best regards,
-Vladimir

0 Kudos
Reply