Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Why the MKL can only call 4 threads?

shiquanhe1984gmail_c
1,249 Views

Hi,

The processor option of our workstation is Intel Xeon X5550 2.ttGHz/8MB. It has 4 cpus and each cpu has 2 cores. In my code, I have set OMP_NUM_THREADS=8 and MKL_NUM_THREADS=8 by the commands omp_set_num_threads (8) and mkl_set_num_threads (8). But the mkl part, where the DSS and LAPACK are used to factorize some sparse and full matrices, only can call 4 threads. While the other c++ part runs with 8 threads. How can I call 8 threads at the mkl part? Thanks so much!

Best regards,
Shiquan

0 Kudos
1 Solution
barragan_villanueva_
Valued Contributor I
1,249 Views
You have
1 packages x 4 cores/pkg x 2 threads/core
but MKL uses just 1 thread per core => 4 in total

See Intel MKL threading behavior on Hyper-Threading systemsfor more details

View solution in original post

0 Kudos
10 Replies
Artem_V_Intel
Employee
1,249 Views
Hello Shiquan,

Please take a look at the description of mkl_set_num_threads () function in the Intel MKL Manual. It contains the next phrases:
"

This function allows you to request independently of OpenMP* how many threads MKL should

use. This is just a hint, and it is not guaranteed that this number of threads will be used. Enter

a positive integer.


"

Best regards,
Artem

0 Kudos
shiquanhe1984gmail_c
1,249 Views

Dear Artem,

Thanks for your kindly reply. I have noticed the description before. But the 8 threads of my workstation are available. That means no other program runs at the computer simultaneously. However, the mkl part still runs with only 4 threads. How can I make it running with the whole 8 threads? Should I set something or the mkl can only recognize the 4 cpus, but ignore that there are 8 cores? The other c++ parts of this program can parallelize with 8 threads well.

Best regards,

Shiquan

0 Kudos
barragan_villanueva_
Valued Contributor I
1,249 Views
Shiquan,

In order to see how many threadsare used during MKL parallelization with libiomp5 library please set the following envs:

KMP_AFFINITY=verbose

Sorry, what kind of OS do you use Linux or Windows?
0 Kudos
shiquanhe1984gmail_c
1,249 Views
Dear Victor,

I encouter the sameproblem on both Linux and Windows system.

Thanks,
Shiquan
0 Kudos
barragan_villanueva_
Valued Contributor I
1,249 Views
So, on Linux please use

export KMP_AFFINITY=verbose

or

export KMP_AFFINITY=verbose,$KMP_AFFINITY

in case if youuse some value already

And send us the output which is to be like as follows for 8-threads on my machine:

OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}
0 Kudos
shiquanhe1984gmail_c
1,249 Views

Hi, Victor,

I have set the env variable and the output is:

OMP: Info #204: KMP_AFFINITY: decoding cpuid leaf 11 APIC ids.

OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}

OMP: Info #156: KMP_AFFINITY: 8 available OS procs

OMP: Info #157: KMP_AFFINITY: Uniform topology

OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)

OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {0,1,2,3,4,5,6,7}

OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {0,1,2,3,4,5,6,7}

It is the same with yours. But the mkl part still runs with only 4 threads while the other c++ parts of this program can parallelize with 8 threads well.

The intel version in our computer is: /opt/intel/Compiler/11.1/064

Thanks,

Shiquan

0 Kudos
barragan_villanueva_
Valued Contributor I
1,250 Views
You have
1 packages x 4 cores/pkg x 2 threads/core
but MKL uses just 1 thread per core => 4 in total

See Intel MKL threading behavior on Hyper-Threading systemsfor more details
0 Kudos
shiquanhe1984gmail_c
1,249 Views

Dear Victor,

Thanks for your help.

According to your advice, I can call the whole 8 threads in my mkl code now. But the code keeps running in the mkl function and can not give the response. I have encountered the similar problem on my laptop with 2 cores and Windows system before. If I select the Parallel option in MKL, the code will keep running and can not finish. Or it even takes more time than the Sequential version and return wrong results. But the Sequential version can finish quickly.

How can I solve this problem? Thanks.

I have set: MKL_DYNAMIC=FALSE

MKL_NUM_THREADS= 8

and

KMP_AFFINITY=granularity=fine,compact,1,0

Best regards,

Shiquan

0 Kudos
shiquanhe1984gmail_c
1,249 Views
I have noticed that:

Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. Intel MKL fits neither of these criteria as the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance when using Intel MKL without HT Technology enabled.

So, maybe I should not use another 4 threads in mkl any more.

Thanks all of you for kindly help!

Best regards,
Shiquan
0 Kudos
Gennady_F_Intel
Moderator
1,249 Views
Yes, Shiquan,you are right.
The similar interesting discussions regarding how HT affect on MKL performance, you can findhere.
--Gennady
0 Kudos
Reply