- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

i'm using MKL 10 and INtel 11.1 under Linux, in particulary FFT calls. My goal is to use OpenMP + MKL to compute FFT. I'm in a second case of this guide:

UsingDftiComputeForward andDftiComputeBackward in a parallel region, give me wrong results. If these function are wrapped into critical region, works well.

So my question is:DftiComputeForward andDftiComputeBackward are thread safe?

Thanks a lot.

Link Copied

7 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

That your test gives wrong results may be caused by incorrect use of MKL. Sharing the test would help to reproduce and identify the problem. For instance, thesecond case in the link you've given implicitly assumes 4 threads in the parallel region - have you checked this?

The compute functions are thread safe assuming not more than N threads are using the same descriptor, where N is the value set by DftiSetValue(hand,DFTI_NUMBER_OF_USER_THREADS, N).

Thanks

Dima

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

now works well. Results are good setting one FFT descriptor per thread, but the performances are poor :(

My MPI test case has 32 nodes with 8 Nehalem cores. Hybrid case has 32 nodes , one MPI process per node and 8 thread per process. Hybrid case is two time slow.

I tested all affinity cases with KML_AFFINITY variable, but the results are the same. So i think is not an affinity problem.

Do you have any idea about these performances?

Thanks so much.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks

Dima

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

If yes, my FFT loops has about 196 iterations (in this case), so about 24 iteration per thread, who is the same amount of work per MPI process in pure MPI code.

Something like this:

[fortran]$OMP DO i=1,196 ..some code... FFTForward ... some code. FFTBAckward. $OMP END DO[/fortran]

My suspect is that each iteration is too big for a single thread, but as mentioned, is the same work of a MPI process..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

What particular MPI are you using?

Best regards,

-Vladimir

What particular MPI are you using?

Best regards,

-Vladimir

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

i'm using OpenMPI 1.3.3.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I did not encounter such problems with OpenMPI.

In order to localize the issue I suggest that you comment your code in the loop
and leave only calls to MKL.

That would help to make the first step towards the problem solution - whether
the slowness is caused by MKL or your code.

Best regards,

-Vladimir

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page