DftiComputeForward and DftiComputeBAckward are thread safe?

unrue · ‎11-22-2010

Dearl mkl users,

i'm using MKL 10 and INtel 11.1 under Linux, in particulary FFT calls. My goal is to use OpenMP + MKL to compute FFT. I'm in a second case of this guide:

http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/

UsingDftiComputeForward andDftiComputeBackward in a parallel region, give me wrong results. If these function are wrapped into critical region, works well.

So my question is:DftiComputeForward andDftiComputeBackward are thread safe?

Thanks a lot.

Dmitry_B_Intel · ‎11-22-2010

Hi

That your test gives wrong results may be caused by incorrect use of MKL. Sharing the test would help to reproduce and identify the problem. For instance, thesecond case in the link you've given implicitly assumes 4 threads in the parallel region - have you checked this?

The compute functions are thread safe assuming not more than N threads are using the same descriptor, where N is the value set by DftiSetValue(hand,DFTI_NUMBER_OF_USER_THREADS, N).

Thanks
Dima

unrue · ‎12-04-2010

Hi Dimitry,

now works well. Results are good setting one FFT descriptor per thread, but the performances are poor :(

My MPI test case has 32 nodes with 8 Nehalem cores. Hybrid case has 32 nodes , one MPI process per node and 8 thread per process. Hybrid case is two time slow.

I tested all affinity cases with KML_AFFINITY variable, but the results are the same. So i think is not an affinity problem.

Do you have any idea about these performances?

Thanks so much.

Dmitry_B_Intel · ‎12-04-2010

If the transforms are large then with 8 tranforms per node you may be short on RAM, because descriptor may require some memory too. What are your transform sizes and RAM/node?
Thanks
Dima

unrue · ‎12-04-2010

Ram in a node is about 24 Gb. What is "tranform size"? The size of FFT loops?

If yes, my FFT loops has about 196 iterations (in this case), so about 24 iteration per thread, who is the same amount of work per MPI process in pure MPI code.

Something like this:

[fortran]$OMP DO  i=1,196
..some code...
FFTForward
... some code.
FFTBAckward.
$OMP END DO[/fortran]

My suspect is that each iteration is too big for a single thread, but as mentioned, is the same work of a MPI process..

Vladimir_Petrov__Int · ‎12-05-2010

Hi,

What particular MPI are you using?

Best regards,
-Vladimir

unrue · ‎12-05-2010

Hi,

i'm using OpenMPI 1.3.3.

Vladimir_Petrov__Int · ‎12-06-2010

Hi,

I did not encounter such problems with OpenMPI.

In order to localize the issue I suggest that you comment your code in the loop and leave only calls to MKL.
That would help to make the first step towards the problem solution - whether the slowness is caused by MKL or your code.

Best regards,
-Vladimir