topic Re: How do you spread FFTs across CPUs? in Intel® oneAPI Math Kernel Library

How do you spread FFTs across CPUs?

bonniegb — Wed, 08 Apr 2009 01:02:42 GMT

I have been unsuccessful getting the MKL FFT routines to spread across CPUs. I have tried not-in-place, single and double precision FFTs, and set every environment variable I can find (like OMP_NUM_THREADS, MKL_NUM_THREAD, MKL_DOMAIN_NUM_THREADS with the FFT portion set to 4, all set to 16). I even tried your example code shipped with MKL. Even though the software queried and found 16 CPUs based on the environment variables, it used only 1 of them. I am using MKL 10.0.1.014. What am I missing?

I successfully inserted my own threads using PTHREADS and called 3 simulataneous FFTs from with the 15 threads, but I am being throttled by the 3 FFTs and I need them to split across CPUs too (it limits it to 3x faster vs. 10x faster which is my goal with fully functioning FFTs).

Thanks,
Bonnie

Re: How do you spread FFTs across CPUs?

TimP — Wed, 08 Apr 2009 01:43:30 GMT

There must be many ways to go about this. I guess you are making a separate MKL call for each FFT, so you want each to be assigned to a different core (not simply to a different logical processor, if you are running a HyperThread platform). If, for example, you used OpenMP:

#pragma parallel for
for(idfft=0; idfft < 3; ++idfft)
somekindof mklfft(data_set(idfft))

with HyperThreading on Xeon 5500, you would need something like
export GOMP_CPU_AFFINITY=1,3,9,11,5,7,13,15,.....
so as to assign 8 threads to different cores and try to spread across CPUs.
With HT disabled, you would not be so dependent on affinity control.
You could do something with pthreads and their affinity calls, if it is safe to assume a machine dedicated to your job.

Intel compilers have implemented the OpenMP 3.0 tasking, in case your prefer that, but the older workshare is not yet multi-threaded.

If the parallelization is done by MKL, that also observes the KMP_AFFINITY or GOMP_CPU_AFFINITY.
I may have totally mis-guessed your intentions.

Re: How do you spread FFTs across CPUs?

bonniegb — Thu, 09 Apr 2009 00:53:56 GMT

I don't quite understand your answer. In the simple case of plain C code without threads calling 1 FFT routine, how can I get this MKL FFT library routine to spread across CPUs? Do I put the #pragma in the code to trigger MKL's internal threads? Do I put the "export GOMP_CPU_AFFINITY=5" in the code to trigger the MKL internal FFT threads to spawn? If the simple case can work, then I'm all set.

Thanks,
Bonnie