Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
34 Views

Problem with setting number of threads for MKL

Dear all,


I need to simply set number of threads for my tests to benchmark the code. I use this code to set number of threads:

[bash]procs=4;

omp_set_num_threads(procs);
mkl_set_num_threads(procs);
mkl_set_dynamic(0);
[/bash]

Using this part of code, I could successfully use 1 and 2 processors. But, when I intend to use 4 or 8 processors, the MKL routines, always use only 2 processors. For example the above code should allocate 4 processors, but it doesn't work correctly. When I try to use all available processors (16 processors), then it seems that it works OK too. I am using a computational node with two quad-core CPUs which gives 8 physical processors. The Hyper-Threading is also enabled on this system, but as far as I know, MKL only uses physical processors, so it shouldn't be any problem. I would also prefer not to disable Hyper-Threading, because the server is located somewhere else and it would be hard to manually restart it. Is there any solution for this?

Regards,

D.
0 Kudos
4 Replies
Highlighted
Valued Contributor I
34 Views

Hi,

MKL function can use less number of threads if volume of data is not large enough.
Tosee how many threads are working please set in environment (`compact'below maybe deleted)
KMP_AFFINITY=verbose,compact

Also, it would be helpful if you have a small reproducer of the problem to analyze it on our side.
0 Kudos
Highlighted
Beginner
34 Views

Using AFFINITY environment variable, it seems that correct number of processors is allocated:

OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,8}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,8}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2,10}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,10}

but when I run the code, I still see that only 2 processors is busy! Why even when it is explicitly shown that 4 processors will be used, MKL still uses only 2 processors? I use "mkl_set_dynamic(0)" to prevent dynamic processor management by MKL. So, why would MKL still decide on number of processors? My problems are not that small, but I also would like to deliberately run small problems using different number of processors to benchmark the code. Any solution for that?

Regards,

D.
0 Kudos
Highlighted
Beginner
34 Views

Hi again,

It was my fault! I had added some code to set affinity flags which was experimental code and did not work correctly. I removed that part and now the number of threads is set in a correct way.

I have another question. Using environment variables (which I think is the simplest way), how can I bind each thread to the processor that is originally assigned when I run my code? What is the simplest way to avoid context switch and keep each thread to its process until the program ends? The affinity is not a big issue when I have two processors, but with four and eight processors, I think this would result a significant improvement.

Thanks,

D.
0 Kudos
Highlighted
Black Belt
34 Views

If I understand your question, I would think KMP_AFFINITY is the simplest way (although some of the options may be difficult to get right).
It makes the most difference when you have more than one last level cache, as you certainly do when you have multiple physical CPUs. The default (same as KMP_AFFINITY=none) leaves the affinity entirely to the OS scheduler, which generally is far from optimal on dual or quad CPU systems running a single job.
Processors is an ambiguous term; for some people it may mean logical processors (e.g. hyperthreads), but MKL thread library is set up to use only 1 thread per core, unless you over-ride that default.
0 Kudos