Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

How do you spread FFTs across CPUs?

bonniegb
Beginner
335 Views

I have been unsuccessful getting the MKL FFT routines to spread across CPUs. I have tried not-in-place, single and double precision FFTs, and set every environment variable I can find (like OMP_NUM_THREADS, MKL_NUM_THREAD, MKL_DOMAIN_NUM_THREADS with the FFT portion set to 4, all set to 16). I even tried your example code shipped with MKL. Even though the software queried and found 16 CPUs based on the environment variables, it used only 1 of them. I am using MKL 10.0.1.014. What am I missing?

I successfully inserted my own threads using PTHREADS and called 3 simulataneous FFTs from with the 15 threads, but I am being throttled by the 3 FFTs and I need them to split across CPUs too (it limits it to 3x faster vs. 10x faster which is my goal with fully functioning FFTs).

Thanks,
Bonnie

0 Kudos
2 Replies
TimP
Honored Contributor III
335 Views
There must be many ways to go about this. I guess you are making a separate MKL call for each FFT, so you want each to be assigned to a different core (not simply to a different logical processor, if you are running a HyperThread platform). If, for example, you used OpenMP:

#pragma parallel for
for(idfft=0; idfft < 3; ++idfft)
somekindof mklfft(data_set(idfft))

with HyperThreading on Xeon 5500, you would need something like
export GOMP_CPU_AFFINITY=1,3,9,11,5,7,13,15,.....
so as to assign 8 threads to different cores and try to spread across CPUs.
With HT disabled, you would not be so dependent on affinity control.
You could do something with pthreads and their affinity calls, if it is safe to assume a machine dedicated to your job.

Intel compilers have implemented the OpenMP 3.0 tasking, in case your prefer that, but the older workshare is not yet multi-threaded.

If the parallelization is done by MKL, that also observes the KMP_AFFINITY or GOMP_CPU_AFFINITY.
I may have totally mis-guessed your intentions.
0 Kudos
bonniegb
Beginner
335 Views
I don't quite understand your answer. In the simple case of plain C code without threads calling 1 FFT routine, how can I get this MKL FFT library routine to spread across CPUs? Do I put the #pragma in the code to trigger MKL's internal threads? Do I put the "export GOMP_CPU_AFFINITY=5" in the code to trigger the MKL internal FFT threads to spawn? If the simple case can work, then I'm all set.

Thanks,
Bonnie
0 Kudos
Reply