Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

Intel MKL (CBLAS) doesn't support more than 8 processors. Is it true ?

yuryserdyuk
Beginner
316 Views
Hi !

I have machine with 2 Intel Xeon CPUX5570 processors. So the number of logical cores is 16.
NowI am trying to perform

[cpp]mkl_set_num_threads ( P );   
  
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, 1.0, A, N, B, N, 0.0, C, N );  [/cpp]

Then for P > 1 and P <= 8 and P odd, program is executed on P - 1 processors.
For P > 8, program is executed always on 8 processors.

How to force program to use more then 8 processors ?

MKL Version used 10.2.4.032.

Thanks.
0 Kudos
5 Replies
TimP
Honored Contributor III
316 Views
Did you refer to previous discussions about how MKL uses 1 thread per core, unless you over-ride the default, in order to avoid accidental performance reduction?
0 Kudos
Gennady_F_Intel
Moderator
316 Views
Yury,
please try to change MKL_DYNAMIC variable:mkl_set_dynamic( FALSE ). See more details into User's Guide. Please pay attention - in this case you may have performancedegradation.
--Gennady
0 Kudos
yuryserdyuk
Beginner
316 Views
Yes, you are right - mkl_set_dynamic helps, but the results degradate considerably:

N

cblas_sgemm (8 proc)

cblas_sgemm(16 proc)

cuBLAS(Tesla 1060 GPU)

8192

6,06

7,26

2,71

10240

11,72

13,90

5,26

12288

20,23

24,32

9,07

14336

32,16

38,06

14,37

16384

48,46

58,80

21,42

18432

68,59

82,60

30,46

N is a matrix size, and time is given in seconds.

So, obviously, Intel MKL doesn't scale more than 8 processors on processors with Hyper-Threading ...

The same picture is observed for cblas_dgemm function ...

0 Kudos
Gennady_F_Intel
Moderator
316 Views
This is an expectingbehaviorof Intel MKL. We don't recommend use HT enabled with this case.
Please read more about into UserGuide "The use of Hyper-Threading Technology".
--Gennady
0 Kudos
TimP
Honored Contributor III
316 Views
That section is in the user guide, found in the Documentation/en_us/mkl/ directory of the compiler installation, page 6-16. It can't be found by the search function in Adobe.
In short, as MKL schedules the floating point adder and multiplier to full effectiveness when running 1 thread per core, and the hyper-threads share the paths to higher level cache and memory, the interference effect of additional threads should not be a surprise.
0 Kudos
Reply