I am running the FFT using MKL on intel cpu, which has 36 physical cores and 72 threads, as shown below.
I didn't use the Openmp but threadpool to do FFT using MKL.
The problem is using the threadpool gives a best performance when setting the number of threads as 36 but not 72. And using more number of threads will always give performance improvement when that number is less than 36. But using more number of threads than 36 will not give performance improvement anymore.
I notice that "To achieve higher performance, set the number of threads to the number of processors or physical cores,": https://software.intel.com/en-us/mkl-linux-developer-guide-improving-performance-with-threading. Though it takes OpenMP, but the thing is the same with threadpool, which is the best performance gotten from setting the number of threads equal to maximum physical cores but not the maximum threads cores.
Why does it like this? Because the data processing complexity of FFT is too high?
So if it is like this, what do the other (36 threads) do? In what situation the 72 threads will fully employed?
Sorry too much questions!
Ant hint will be appreciated!
I feel a bit confused, how you implement thread pool? OpenMP or pthread, or any other tool? Actually, many Intel processor support HT (Hyper threading) which enables OS address 2 or 4 virtual cores (logical processors) to work. That's the reason OS display processors as double or quadruple (KNL) number of physical cores. However, MKL routines perform with only one thread per physical core for reason to maximum performance on Intel processor. That means, MKL is not using HT, whatever threading control tool (OpenMP, pthread, TBB) you use, the maximum num of thread allow to set for MKL routine is the num of physical cores.
Thanks very much!
I am using Pthread.
So does that means though using the HT will enable more threads( 72 threads) but the performance actually will not be as good as the one without HT (36 threads= physical cores)?