Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL spurns only minimal number of threads for SGEMM

Dave_O_
Beginner
716 Views

Hi

I have a Xeon E5-2620 processor and benchmarking with SGEMM. Why does MKL spurn only 6 threads (hardware threads) instead of the expected 12 threads (hardware plus software threads)?

The same code on xeon Phi spurns the entire 240 threads (hardware and software).

 

0 Kudos
5 Replies
TimP
Honored Contributor III
716 Views
Most of the basic explanation is in the section about MKL_DYNAMIC in the MKL user guide. The default is intended to maximize performance rather than number of hyperthreads in use or power consumption. Of course, a single 6-core CPU may not have been tested to the extent that 8 and 10-core dual CPU platforms have, so you are welcome to change the setting to see if it helps your case, or even if you simply like to peg your displays. While FPU performance on Xeon peaks with 1 thread per core, on Intel(r) Xeon Phi(tm), 3 threads per core are needed to reach full VPU performance, and the MKL can use all 4 threads per core effectively (on large enough problems) when considering data shuffling optimizations.
0 Kudos
Dave_O_
Beginner
716 Views

I don't understand.

0 Kudos
TimP
Honored Contributor III
716 Views
Do you have specific questions; did you try setting MKL_DYNAMIC=false and setting specific numbers of threads and affinity schemes? I don't think it makes sense to repeat the entire story about MKL_DYNAMIC here when you can easily look it up, or to guess which aspect of hyperthreading you may want explained. Are you "spurning" the documentation as well as the dictionary definition of the word?
0 Kudos
Dave_O_
Beginner
716 Views

men, what are you talking about.

0 Kudos
Ying_H_Intel
Employee
716 Views

Hi Dave, 

Sorry for the taketive:)   Roughly speaking,  yes, in order to get better performance, MKL spurn the threads based on hardware resource and  experience.  

    • -on Xeon machine,  use hardware core number by default.  In your xeon machine, it is 6.  ( And the 12 software threads, we call it Hyper-Threading thread in Xeon processor)
    • on Xeon phi machine, use Hardware + software threads.  It is 60x4 = 240,  ( total 61 core , and 1 core was reserved) 

    The reason on Xeon machine is that Hyper-Threading Technology (HT Technology)  On Xeon is  only effective when each thread is performing different types of operations and when there are under-utilized resources on the processor.  The threads in Intel MKL do exact same operation, so it can't benefit from HT thread.  As a result, MKL fork 6 threads instead 12. 

    Same reason, but the HT technology on Xeon phi was implemented in different way than Xeon,  it require at least 3 or 4 to feed all computing resources. in order to get better performance on Xeon Phi, MKL fork 240 thread be default. 

    Please see more in https://software.intel.com/en-us/forums/topic/294954  for Xeon and you can  change the thread number by MKL_DYNAMIC and MKL_SET_NUM_THREADS setting.

    Best Regards,

    Ying 

     

    0 Kudos
    Reply