MKL spurns only minimal number of threads for SGEMM

Dave_O_ · ‎04-30-2014

Hi

I have a Xeon E5-2620 processor and benchmarking with SGEMM. Why does MKL spurn only 6 threads (hardware threads) instead of the expected 12 threads (hardware plus software threads)?

The same code on xeon Phi spurns the entire 240 threads (hardware and software).

TimP · ‎04-30-2014

Most of the basic explanation is in the section about MKL_DYNAMIC in the MKL user guide. The default is intended to maximize performance rather than number of hyperthreads in use or power consumption. Of course, a single 6-core CPU may not have been tested to the extent that 8 and 10-core dual CPU platforms have, so you are welcome to change the setting to see if it helps your case, or even if you simply like to peg your displays. While FPU performance on Xeon peaks with 1 thread per core, on Intel(r) Xeon Phi(tm), 3 threads per core are needed to reach full VPU performance, and the MKL can use all 4 threads per core effectively (on large enough problems) when considering data shuffling optimizations.

Dave_O_ · ‎04-30-2014

I don't understand.

TimP · ‎05-01-2014

Do you have specific questions; did you try setting MKL_DYNAMIC=false and setting specific numbers of threads and affinity schemes? I don't think it makes sense to repeat the entire story about MKL_DYNAMIC here when you can easily look it up, or to guess which aspect of hyperthreading you may want explained. Are you "spurning" the documentation as well as the dictionary definition of the word?

Dave_O_ · ‎05-01-2014

men, what are you talking about.

Ying_H_Intel · ‎05-04-2014

Hi Dave,

Sorry for the taketive:) Roughly speaking, yes, in order to get better performance, MKL spurn the threads based on hardware resource and experience.

-on Xeon machine, use hardware core number by default. In your xeon machine, it is 6. ( And the 12 software threads, we call it Hyper-Threading thread in Xeon processor)
on Xeon phi machine, use Hardware + software threads. It is 60x4 = 240, ( total 61 core , and 1 core was reserved)

The reason on Xeon machine is that Hyper-Threading Technology (HT Technology) On Xeon is only effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. The threads in Intel MKL do exact same operation, so it can't benefit from HT thread. As a result, MKL fork 6 threads instead 12.

Same reason, but the HT technology on Xeon phi was implemented in different way than Xeon, it require at least 3 or 4 to feed all computing resources. in order to get better performance on Xeon Phi, MKL fork 240 thread be default.

Please see more in https://software.intel.com/en-us/forums/topic/294954 for Xeon and you can change the thread number by MKL_DYNAMIC and MKL_SET_NUM_THREADS setting.

Best Regards,

Ying