What does __kmp_hierarchical_barrier_release imply in vTune?

YW · ‎03-08-2015

Hi,

I profiled my program running on Xeon Phi in native mode via vTune and realized that a lot of time goes to __kmp_hierarchical_barrier_release. What does this normally imply? I know it must be some OpenMP issue, but have no idea how to solve it.

BTW, the same piece of code, whening running on Xeon, vTune tells that some significant portion of time (much less than the __kmp_hierarchical_barrier_release in Phi though) goes to __kmp_launch_threads.

Thanks in advance!

Dmitry_P_Intel1 · ‎03-08-2015

Hello,

The reason of the issue might be imbalance on an OpenMP barrier - threads are spinning in a busy wait burning CPU on OpenMP functions and they are goign to VTune hotspots.

I recommend to use /OpenMP Region/.. groupings to see CPU time in the region break down by classification that should help to understand what CPU time spent in OpenMP means (it is better to use VTune Amplifier XE 2015 Update 2 for this).

You can also look at https://software.intel.com/en-us/node/529832 help article for methodology on OpenMP tuning we offer in VTune.

Thanks & Regards, Dmitry