topic I don't understand. in Intel® oneAPI Math Kernel Library

MKL spurns only minimal number of threads for SGEMM

Dave_O_ — Wed, 30 Apr 2014 20:24:47 GMT

I have a Xeon E5-2620 processor and benchmarking with SGEMM. Why does MKL spurn only 6 threads (hardware threads) instead of the expected 12 threads (hardware plus software threads)?

The same code on xeon Phi spurns the entire 240 threads (hardware and software).

Most of the basic explanation

TimP — Thu, 01 May 2014 01:12:21 GMT

Most of the basic explanation is in the section about MKL_DYNAMIC in the MKL user guide. The default is intended to maximize performance rather than number of hyperthreads in use or power consumption. Of course, a single 6-core CPU may not have been tested to the extent that 8 and 10-core dual CPU platforms have, so you are welcome to change the setting to see if it helps your case, or even if you simply like to peg your displays. While FPU performance on Xeon peaks with 1 thread per core, on Intel(r) Xeon Phi(tm), 3 threads per core are needed to reach full VPU performance, and the MKL can use all 4 threads per core effectively (on large enough problems) when considering data shuffling optimizations.

I don't understand.

Dave_O_ — Thu, 01 May 2014 06:49:03 GMT

I don't understand.

Do you have specific

TimP — Thu, 01 May 2014 11:23:41 GMT

Do you have specific questions; did you try setting MKL_DYNAMIC=false and setting specific numbers of threads and affinity schemes? I don't think it makes sense to repeat the entire story about MKL_DYNAMIC here when you can easily look it up, or to guess which aspect of hyperthreading you may want explained. Are you "spurning" the documentation as well as the dictionary definition of the word?

men, what are you talking

Dave_O_ — Thu, 01 May 2014 17:10:42 GMT

men, what are you talking about.

Hi Dave,

Ying_H_Intel — Sun, 04 May 2014 08:54:57 GMT

Hi Dave,

Sorry for the taketive:) Roughly speaking, yes, in order to get better performance, MKL spurn the threads based on hardware resource and experience.

-on Xeon machine, use hardware core number by default. In your xeon machine, it is 6. ( And the 12 software threads, we call it Hyper-Threading thread in Xeon processor)
on Xeon phi machine, use Hardware + software threads. It is 60x4 = 240, ( total 61 core , and 1 core was reserved)

The reason on Xeon machine is that Hyper-Threading Technology (HT Technology) On Xeon is only effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. The threads in Intel MKL do exact same operation, so it can't benefit from HT thread. As a result, MKL fork 6 threads instead 12.

Same reason, but the HT technology on Xeon phi was implemented in different way than Xeon, it require at least 3 or 4 to feed all computing resources. in order to get better performance on Xeon Phi, MKL fork 240 thread be default.

Please see more in https://software.intel.com/en-us/forums/topic/294954 for Xeon and you can change the thread number by MKL_DYNAMIC and MKL_SET_NUM_THREADS setting.

Best Regards,

Ying