Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6539 Discussions

Profiler shows code spends a lot of time in thread library


We are trying to profile an in-house developed code and we noticed that at the top of the list, taking up 20% of the time, is a function called kmp_hyper_barrier_release. We are not compiling our code with OpenMP specifically, but we are using Pardiso in the MKL library which I believe does use OpenMP. The puzzling thing is that it seems that the code is only spending a fraction of the time inside the MKL library (~15%), so it is strange that this kmp function is taking up so much time. Even worse, there are two more kmp functions: kmp_x86_pause (taking up 11%), and kmp_execute_tasks (6%). I was wondering if anybody could explain what these functions do and why they are impacting the performance of our code so dramatically.


0 Kudos
1 Reply
Black Belt
Running against should at least tell you if these OpenMP function times are associated with thread work imbalance or the like. It should also help you find out if there is a favorable effect from setting appropriate KMP_AFFINITY values. Work imbalance, for example, could arise from certain pairs of threads on different CPUs working on the same or adjacent cache lines, thus taking longer, while other threads complete quickly.
Note that bare-bones use of libiompprof5 simply writes a useful text summary in guide.gvs in your working directory.