- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi !
I have machine with 2 Intel Xeon CPUX5570 processors. So the number of logical cores is 16.
NowI am trying to perform
Then for P > 1 and P <= 8 and P odd, program is executed on P - 1 processors.
For P > 8, program is executed always on 8 processors.
How to force program to use more then 8 processors ?
MKL Version used 10.2.4.032.
Thanks.
I have machine with 2 Intel Xeon CPUX5570 processors. So the number of logical cores is 16.
NowI am trying to perform
[cpp]mkl_set_num_threads ( P ); cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, 1.0, A, N, B, N, 0.0, C, N ); [/cpp]
Then for P > 1 and P <= 8 and P odd, program is executed on P - 1 processors.
For P > 8, program is executed always on 8 processors.
How to force program to use more then 8 processors ?
MKL Version used 10.2.4.032.
Thanks.
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you refer to previous discussions about how MKL uses 1 thread per core, unless you over-ride the default, in order to avoid accidental performance reduction?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yury,
please try to change MKL_DYNAMIC variable:mkl_set_dynamic( FALSE ). See more details into User's Guide. Please pay attention - in this case you may have performancedegradation.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you are right - mkl_set_dynamic helps, but the results degradate considerably:
N |
cblas_sgemm (8 proc) |
cblas_sgemm(16 proc) |
cuBLAS(Tesla 1060 GPU) |
8192 |
6,06 |
7,26 |
2,71 |
10240 |
11,72 |
13,90 |
5,26 |
12288 |
20,23 |
24,32 |
9,07 |
14336 |
32,16 |
38,06 |
14,37 |
16384 |
48,46 |
58,80 |
21,42 |
18432 |
68,59 |
82,60 |
30,46 |
N is a matrix size, and time is given in seconds.
So, obviously, Intel MKL doesn't scale more than 8 processors on processors with Hyper-Threading ...
The same picture is observed for cblas_dgemm function ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an expectingbehaviorof Intel MKL. We don't recommend use HT enabled with this case.
Please read more about into UserGuide "The use of Hyper-Threading Technology".
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That section is in the user guide, found in the Documentation/en_us/mkl/ directory of the compiler installation, page 6-16. It can't be found by the search function in Adobe.
In short, as MKL schedules the floating point adder and multiplier to full effectiveness when running 1 thread per core, and the hyper-threads share the paths to higher level cache and memory, the interference effect of additional threads should not be a surprise.
In short, as MKL schedules the floating point adder and multiplier to full effectiveness when running 1 thread per core, and the hyper-threads share the paths to higher level cache and memory, the interference effect of additional threads should not be a surprise.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page