MKL FFT library and thread assumptions

Richard_S_3 · ‎12-19-2012

Hi,

We are curious if we are using DFTI_NUMBER_OF_USER_THREADS correctly.

We use the MKL FFT library in our application: the application is thread rich, but we don't use OpenMP. We simply create all the POSIX (system level) threads ourself. Among all these threads, we want to share the MKL DFTI descriptors. The Descriptor, if we assume the model of FFTW or any other FFT library, typically computes a twiddle table based on the length of the FFT. This knowledge is "encapsulated" inside the descriptor.

Our hope is that by sharing descriptors among threads that we will reduce memory size (i,.e, share the twiddle tables). We seem to be successfully using MKL. Until recently. At one point, we do a very large FFT (16Meg) and everything fails. We believe (after some look in the forums here) that setting DFTI_NUMBER_OF_USER_THREADS to some reasonable value like 16 (it was 1 before) is the right thing to do, but we're not sure. It seems to fix the problem (setting it to 16), but we wanted to verify: Given the scenario described above, (we create our own threads and want to share the Descriptors among several non-realted threads), is this correct?

Now, our applications tend to be very FFT heavy: some threads on the front-end use an FFT, the main processing uses threads in a work-crew/map-reduce paradigm, and the back-end processing uses FFTs. In other words, all sorts of threads from all over the application can be sharing the Descriptors, and there is no known "apriori" limit. We don't have any insight how setting DFTI_NUMBER_OF_USER_THREADS to 16 allows the multiple threads to reuse it (in FFTW, there's no notion of this). Does each thread "register" with the descriptor? Is there thread-local data with the descriptor? Once a thread has used the descriptor, can only that thread reuse it in that way? Or can I keep re-using the descriptor in multiple threads? (I.e., setting the DFTI_NUMBER_OF_USER_THREADS to 16, have some 16 threads use it, then another 16 threads, then a different 16 threads, or do the same threads have to reuse it?).

If anyone knows about how DFT_NUMBER_OF_USER_THREADS works with the descriptor, it would be very helpful. We think this fixes out problem, but we'd like to know if we have the right solution: once a thread has used a "sharing" slot, can no other thread use it?

Thanks in advance. I am happy to supply some code showing how we use it. I also want to thank the Intel Forums for helping us find the DFTI_NUMBER_OF_USER_THREADS in the first place!

Gooday,

Richie

Dmitry_B_Intel · ‎12-19-2012

Hi Richie, DftiCompute functions need thread-local read-write memory. For performance reasons that memory used to be associated with DFTI descriptor. DFTI_NUMBER_OF_USER_THREADS parameter was provided to duplicate the memory per-calling-thread, so the descriptor could be shared by calling threads. This behavior will be fixed in future, and one will not need to specify number of calling threads. I also wonder what version of MKL do you use and if the status returned by DftiCompute functions is checked in your application. It may return an error if N+1st thread uses the descriptor committed with configuration parameter DFTI_NUMBER_OF_USER_THREADS set to N. Its default value is one. Thanks Dima

Richard_S_3 · ‎12-20-2012

Hi Dima, Thanks for the reply. Re: checking error status: Before we had the 16M error (described above), we had error checking in most places (not all). I hadn't had error checking for the compute or mkl_malloc. I went back and made sure I checked the return status of ALL of my MKL calls: I never did see an error. It's possible I missed checking the status of a call, but I don't think so. I never did see MKL tell me I had too many threads connected to the descriptor. Re: version: We are using the MKL that comes bundled with the Intel 12 compiler (the version, according to my build paths is using composer_xe_2011_sp1.8.273, so I think that means Intel 12.273? The 'icc --version' returns 12.1.2 20111128). I am sorry, I don't know how to separate the MKL version from the Intel tools/compiler suite bundle. All the mkl stuff is under composer_xe_2011_sp1.8.273 dir above. I figured that thread-local storage was used. Do you know if after one thread has "finished" with its FFT, can a different thread go in and "reuse" the thread-local storage? Thanks again for the quick reply. It's good to know we won't necessarily have to worry about this in a future release: is there a particular macro we can check and/or version macro to check for this? Gooday, Richie

barragan_villanueva_ · ‎12-23-2012

Hi, As to MKL version: Please look and run example from $MKLROOT/examples/versionqueryc for C/C++ and $MKLROOT/examples/versionqueryf for Fortran As to macro to check macro: Please look at $MKLROOT/include/mkl.h for C/C++ and $MKLROOT/include/mkl.fi for Fortran where there are defined __INTEL_MKL_* macros. E.g. for MK 11.0.1 #define __INTEL_MKL_BUILD_DATE 20121009 #define __INTEL_MKL__ 11 #define __INTEL_MKL_MINOR__ 0 #define __INTEL_MKL_UPDATE__ 1 #define INTEL_MKL_VERSION (__INTEL_MKL__ * 10000 + \ __INTEL_MKL_MINOR__ * 100 + __INTEL_MKL_UPDATE__)