Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Concurrency Problem with Intel MKL BLAS

nunoxic
Beginner
578 Views
http://software.intel.com/en-us/forums/showthread.php?t=86020
Didn't intend to multi-post but there seems to be no choice since I need inputs from MKL experts and VTune experts.
Thanks
0 Kudos
6 Replies
barragan_villanueva_
Valued Contributor I
578 Views
Please use the following environment settings while using libiomp5
1) set KMP_VERSION
to see OpenMP run-time library version you are using
2) set KMP_AFFINITY=verbose,$KMP_AFFINITY
to see used affinity

3) try KMP_AFFINITY=granularity=fine,compact,1,0
this is recommended affinity from MKL doc if SMT(HT)is enabled

4) play with KMP_BLOCKTIME
Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region,before sleeping (default is 200 milliseconds)
0 Kudos
nunoxic
Beginner
578 Views
Thanks for your inputs but none of the above made any difference to the code
I played with KMP_BLOCKTIME for an hour or more. I set it to 0 200 inf and what not but it lead to nowhere. Somtimes it sped up the execution for a given input data but when the data was changed, the optimality was lost.

What is the difference between linking using -libomp5 and -openmp
From my experiments, I found -libomp5 to be much much faster than -openmp.

I tried to read up about KMP here :
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2009/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm
but it is going over my head. Are KMP and OMP different things or are they same things ?
0 Kudos
barragan_villanueva_
Valued Contributor I
578 Views
Quoting nunoxic
What is the difference between linking using -libomp5 and -openmp
From my experiments, I found -libomp5 to be much much faster than -openmp.


It's strange :( In case of Intel compiler and mkl_intel_thread library there should be no differences.
So, what is link link command you are using?

0 Kudos
TimP
Honored Contributor III
578 Views
-libomp5 shouldn't work; did you mean -liomp5 ? The latter is set by ifort -openmp, but you would need to specify the library explicitly if you were using some other command for linking.
The KMP environment variables are specific to Intel OpenMP, while the OMP ones are in accordance with OpenMP standard.
A purpose of increasing KMP_BLOCKTIME would be to maintain KMP_AFFINITY settings across a gap of more than 0.2 second between OpenMP parallel regions. It's entirely possible that KMP_BLOCKTIME has little effect in normal circumstances.
0 Kudos
nunoxic
Beginner
578 Views
Yes ! My bad, I meant liomp5
So as I see it :
1. Threading might help but not much
2. There is no point in adding threads to BLAS Level 2 Operations
Is there any way at all to speed up this code?
(Unless I move on to GPU computing ? )
0 Kudos
TimP
Honored Contributor III
578 Views
If your application doesn't have enough inherent parallelism to benefit from threading, GPU is not a likely solution. It's true that BLAS level 2 operations, which normally would be vectorized, would need to operate on extremely large data sets to benefit from threaded parallelism internal to those operations. Thus it is normal to apply parallelism at a higher level (each thread performing independent entire level 2 operations).
0 Kudos
Reply