I am evaluating the MKL library for the new project. While computing the FFT, I could see that performance of the FFT (in terms of GFLOP/s) reduces if the threads are in different socket ( using thread affinity). The test was carried out in Intel(R) Xeon(R) CPU E5-2650 processor and compiler is gcc. Please let me know the reason.
The first thought , it is threading and memory localization problem. The current CPU and memory structure expected the threads have access to their data as near as possible. It may not only about MKL FFT, it should be any mulitithreaded computation task. How do store the FFT data and affinity the thread to different sockets?
And MKL developer guide https://software.intel.com/en-us/mkl-linux-developer-guide-managing-multi-core-performance have some discussed about how to get best performance by thread affinity. for your reference.
You can obtain best performance on systems with multi-core processors by requiring that threads do not migrate from core to core.