Hi:
I use mkl_dss to solve a problem. I already use mkl_set_num_threads(8) to set the maximum threads of my computer. But when I run the program, I use top command and just can see only 4 cpu are 100% running, the other 4 cpu just 1%. My cpu is Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz, 4 cores and 8 threads.
However, I run the same program on the other machine Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, 6 cores 12 threads. I use mkl_set_num_threads(12) to set the maximum threads. The program can take full advantage of all 12 cpu.
I use g++ -std=c++14 -O2 -march=native source.cpp -lmkl_rt to compile my program.
Are you heeding documentation of mkl_dynamic ? https://software.intel.com/en-us/node/528547
Did you compare performance of your task at various num_threads settings? On 6-core Westmere it would be particularly difficult to avoid performance degradation when using all hyperthreads, but X5650 normally has 2 6-core CPUs, so you would need 24 threads to take full disadvantage of hyperthreading. 2 of the 6 cores which have their own path to last level cache are better able to support hyperthreading than the other 4, so the Westmere has some unique characteristics not shared by other models.
Well, that is not the case. I think it is the problem with mkl 11.2.3. Because Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz can use all the 8 threads in older version of mkl.
Ok, I figure out the problem. Thanks to Tim Prince, he is right. I did not pay attention to MKL_DYNAMIC. After I add the statement mkl_set_dynamic(false), Intel(R) Core(TM) i7-4700HQ CPU @ 2.40GHz can use all the threads. I just need to compare performance of my program at various num_threads settings.
For more complete information about compiler optimizations, see our Optimization Notice.