- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've been using Vtune to profile my OpenMP application, and it tells me that a lot of the CPU time is spent in Overhead time, under the "others" category. Could you let me know what this category entails, and how I can further root cause this?
Thanks,
srini009
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Could you please choose /funciton/Call stack grouping:
Sort by CPU Time/Overhead Time/Other column and provide 3-5 function names from the top (expecting that they will be from OpenMP runtime).
Also please provide the intel compiler version you use and VTune version.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Intel compiler version: ifort version 15.0.2
Intel Vtune Amplifier version: vtune_amplifier_xe_2015.2.0.393444
Please find below a screenshot after sorting the "others" column of the overhead time.
Thanks,
Srinivasan Ramesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And one more question - do you use collection with stacks like basic hotposts or advanced hotspots with stacks?
If so please choose "user functions + 1" to "user/system fucntions" and redo the screenshot.
BTW with newer compiler version 15.0.3 and higher and new VTune Amplifier XE 2016 Gold you should have much more comprehensive OpenMP analysis I expect.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Please see the screenshot below with user/system functions option set.
Yes, I am indeed using basic hotspot analysis. Also, thanks for the heads up about the new version, I'll consider getting it.
Regards,
Srinivasan Ramesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
From the screenshot we can see that the overhead is connected with OpenMP stack stitching mechanism that VTune uses for more structured stack representation for OpenMP worker threads.
You can try advanced-hotspots with stacks like:
amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-sampling ./my_app
or choose the stack knob in GUI configuration for advanced hotspot analysis:
This collector does not have stack stitching so you will be able to avoid the overhead. We are working on the overhead reduction.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
I tried running advanced hotspot analysis for this application, and it failed, saying "amplxe: Error: Cannot enable Hardware Event-based Sampling: problem with the driver (sep*/sepdrv*)."
So I followed the instructions in the help page for "Building and Managing the Sampling Driver" in Vtune. Please see below for the output of ./insmod-sep3 -q:
Output:
pax driver is loaded and owned by group "srinivasan" with file permissions "775".
sep3_15 driver is loaded and owned by group "srinivasan" with file permissions "775".
vtsspp driver is not loaded.
I tried building the vtsspp driver, by cd'ing to /opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp, and giving ./build-driver as root.
Compilation fails with the following output:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_compat_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:232: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:326: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:327: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_mmap_region_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:451: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:452: error: ‘struct pt_regs’ has no member named ‘rsi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:453: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:455: error: ‘struct pt_regs’ has no member named ‘rr8’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:456: error: ‘struct pt_regs’ has no member named ‘rr9’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘probe_sched_process_exec_compat’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:771: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:772: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:774: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:776: error: ‘struct kretprobe’ has no member named ‘addr’
make[2]: *** [/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.o] Error 1
make[1]: *** [_module_/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp] Error 2
Any idea what is happening? Appreciate your help.
Regards,
Srinivasan Ramesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I expect that since your VTune build is pretty old vtss (stack sampling) driver was not yet adapted to the relatively new OS that you might currently have.
I would recommend to upgrade the VTune build to VTune Amplifier XE 2016.
BTW: I found one switch that can be helpful for your basic hotspots collection to avoid OpenMP stack stitching.
It defenitely works for recent builds. Please try on your build:
amplxe-cl -collect hotspots -run-pass-thru=--no-stack-stitching ./my_app
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1st screen-short: _kmp_join_call() took many time at "imbalanced" - it mean, some threads ran shortly and some ran longer. Need to review timeline for threads' life and adjust algorithm possibly, reduce wait time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Your suggestion worked! Although I couldn't get advanced hotspots to run, I tried the run-pass-thru command line option, and the Overhead time in "other" category has come down considerably. Please see below:
1. Without the run-pass-thru option:
2. With the run-pass-thru option:
@Peter: Thanks for pointing it out. I am looking into load balancing for the loop that is contributing to the imbalance.
Kindly consider this thread as closed, and thanks for your help Dmitry.
Thanks,
Srinivasan Ramesh

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page