Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Srinivasan_R_1
Beginner
102 Views

Details on "other" category in Overhead time

Hi,

I've been using Vtune to profile my OpenMP application, and it tells me that a lot of the CPU time is spent in Overhead time, under the "others" category. Could you let me know what this category entails, and how I can further root cause this?

Thanks,

srini009

0 Kudos
9 Replies
Dmitry_P_Intel1
Employee
102 Views

Hello,

Could you please choose /funciton/Call stack grouping:

Grouping_f_c.png

Sort by CPU Time/Overhead Time/Other column and provide 3-5 function names from the top (expecting that they will be from OpenMP runtime).

Also please provide the intel compiler version you use and VTune version.

Thanks & Regards, Dmitry

Srinivasan_R_1
Beginner
102 Views

Hi Dmitry,

Intel compiler version: ifort version 15.0.2

Intel Vtune Amplifier version: vtune_amplifier_xe_2015.2.0.393444

Please find below a screenshot after sorting the "others" column of the overhead time.

Overhead-Others.png

Thanks,

Srinivasan Ramesh

Dmitry_P_Intel1
Employee
102 Views

And one more question - do you use collection with stacks like basic hotposts or advanced hotspots with stacks?

If so please choose "user functions + 1" to "user/system fucntions" and redo the screenshot.

User_system.png

BTW with newer compiler version 15.0.3 and higher and new VTune Amplifier XE 2016 Gold you should have much more comprehensive OpenMP analysis I expect.

Thanks & Regards, Dmitry

 

Srinivasan_R_1
Beginner
102 Views

Hi Dmitry,

Please see the screenshot below with user/system functions option set.

Overhead-others-user-sys.png

Yes, I am indeed using basic hotspot analysis. Also, thanks for the heads up about the new version, I'll consider getting it.

Regards,

Srinivasan Ramesh

Dmitry_P_Intel1
Employee
102 Views

Hello,

From the screenshot we can see that the overhead is connected with OpenMP stack stitching mechanism that VTune uses for more structured stack representation for OpenMP worker threads. 

You can try advanced-hotspots with stacks like:

amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-sampling ./my_app

or choose the stack knob in GUI configuration for advanced hotspot analysis:

stacks.png

This collector does not have stack stitching so you will be able to avoid the overhead. We are working on the overhead reduction.

Thanks & Regards, Dmitry

Srinivasan_R_1
Beginner
102 Views

Hi Dmitry,

I tried running advanced hotspot analysis for this application, and it failed, saying "amplxe: Error: Cannot enable Hardware Event-based Sampling: problem with the driver (sep*/sepdrv*)."

So I followed the instructions in the help page for "Building and Managing the Sampling Driver" in Vtune. Please see below for the output of ./insmod-sep3 -q:

Output:

pax driver is loaded and owned by group "srinivasan" with file permissions "775".
sep3_15 driver is loaded and owned by group "srinivasan" with file permissions "775".
vtsspp driver is not loaded.

I tried building the vtsspp driver, by cd'ing to /opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp, and giving ./build-driver as root.

Compilation fails with the following output:

/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_compat_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:232: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:326: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:327: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_mmap_region_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:451: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:452: error: ‘struct pt_regs’ has no member named ‘rsi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:453: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:455: error: ‘struct pt_regs’ has no member named ‘rr8’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:456: error: ‘struct pt_regs’ has no member named ‘rr9’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘probe_sched_process_exec_compat’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:771: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:772: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:774: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:776: error: ‘struct kretprobe’ has no member named ‘addr’
make[2]: *** [/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.o] Error 1
make[1]: *** [_module_/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp] Error 2

Any idea what is happening? Appreciate your help.

Regards,

Srinivasan Ramesh

 

Dmitry_P_Intel1
Employee
102 Views

Hello,

I expect that since your VTune build is pretty old vtss (stack sampling) driver was not yet adapted to the relatively new OS that you might currently have.

I would recommend to upgrade the VTune build to VTune Amplifier XE 2016.

BTW: I found one switch that can be helpful for your basic hotspots collection to avoid OpenMP stack stitching.

It defenitely works for recent builds. Please try on your build:

amplxe-cl -collect hotspots -run-pass-thru=--no-stack-stitching ./my_app

Thanks & Regards, Dmitry

Peter_W_Intel
Employee
102 Views

1st screen-short: _kmp_join_call() took many time at "imbalanced" - it mean, some threads ran shortly and some ran longer. Need to review timeline for threads' life and adjust algorithm possibly, reduce wait time. 

Srinivasan_R_1
Beginner
102 Views

Hi Dmitry,

Your suggestion worked! Although I couldn't get advanced hotspots to run, I tried the run-pass-thru command line option, and the Overhead time in "other" category has come down considerably. Please see below:

1. Without the run-pass-thru option:

DefaultOverhead.png

 

2. With the run-pass-thru option:

WithoutStackStitching.png

@Peter: Thanks for pointing it out. I am looking into load balancing for the loop that is contributing to the imbalance.

Kindly consider this thread as closed, and thanks for your help Dmitry.

Thanks,

Srinivasan Ramesh

Reply