- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The processor option of our workstation is Intel Xeon X5550 2.ttGHz/8MB. It has 4 cpus and each cpu has 2 cores. In my code, I have set OMP_NUM_THREADS=8 and MKL_NUM_THREADS=8 by the commands omp_set_num_threads (8) and mkl_set_num_threads (8). But the mkl part, where the DSS and LAPACK are used to factorize some sparse and full matrices, only can call 4 threads. While the other c++ part runs with 8 threads. How can I call 8 threads at the mkl part? Thanks so much!
Best regards,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1 packages x 4 cores/pkg x 2 threads/core
but MKL uses just 1 thread per core => 4 in total
See Intel MKL threading behavior on Hyper-Threading systemsfor more details
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please take a look at the description of mkl_set_num_threads () function in the Intel MKL Manual. It contains the next phrases:
"
This function allows you to request independently of OpenMP* how many threads MKL should
use. This is just a hint, and it is not guaranteed that this number of threads will be used. Enter
a positive integer.
"
Best regards,
Artem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Artem,
Thanks for your kindly reply. I have noticed the description before. But the 8 threads of my workstation are available. That means no other program runs at the computer simultaneously. However, the mkl part still runs with only 4 threads. How can I make it running with the whole 8 threads? Should I set something or the mkl can only recognize the 4 cpus, but ignore that there are 8 cores? The other c++ parts of this program can parallelize with 8 threads well.
Best regards,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order to see how many threadsare used during MKL parallelization with libiomp5 library please set the following envs:
KMP_AFFINITY=verbose
Sorry, what kind of OS do you use Linux or Windows?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I encouter the sameproblem on both Linux and Windows system.
Thanks,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
export KMP_AFFINITY=verbose
or
export KMP_AFFINITY=verbose,$KMP_AFFINITY
in case if youuse some value already
And send us the output which is to be like as follows for 8-threads on my machine:
OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Victor,
I have set the env variable and the output is:
OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}
It is the same with yours. But the mkl part still runs with only 4 threads while the other c++ parts of this program can parallelize with 8 threads well.
The intel version in our computer is: /opt/intel/Compiler/11.1/064
Thanks,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1 packages x 4 cores/pkg x 2 threads/core
but MKL uses just 1 thread per core => 4 in total
See Intel MKL threading behavior on Hyper-Threading systemsfor more details
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Victor,
Thanks for your help.
According to your advice, I can call the whole 8 threads in my mkl code now. But the code keeps running in the mkl function and can not give the response. I have encountered the similar problem on my laptop with 2 cores and Windows system before. If I select the Parallel option in MKL, the code will keep running and can not finish. Or it even takes more time than the Sequential version and return wrong results. But the Sequential version can finish quickly.
How can I solve this problem? Thanks.
I have set: MKL_DYNAMIC=FALSE
MKL_NUM_THREADS= 8
and
KMP_AFFINITY=granularity=fine,compact,1,0
Best regards,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.
So, maybe I should not use another 4 threads in mkl any more.
Thanks all of you for kindly help!
Best regards,
Shiquan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page