Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Details on "other" category in Overhead time

Srinivasan_R_1
Beginner
643 Views

Hi,

I've been using Vtune to profile my OpenMP application, and it tells me that a lot of the CPU time is spent in Overhead time, under the "others" category. Could you let me know what this category entails, and how I can further root cause this?

Thanks,

srini009

0 Kudos
9 Replies
Dmitry_P_Intel1
Employee
643 Views

Hello,

Could you please choose /funciton/Call stack grouping:

Grouping_f_c.png

Sort by CPU Time/Overhead Time/Other column and provide 3-5 function names from the top (expecting that they will be from OpenMP runtime).

Also please provide the intel compiler version you use and VTune version.

Thanks & Regards, Dmitry

0 Kudos
Srinivasan_R_1
Beginner
643 Views

Hi Dmitry,

Intel compiler version: ifort version 15.0.2

Intel Vtune Amplifier version: vtune_amplifier_xe_2015.2.0.393444

Please find below a screenshot after sorting the "others" column of the overhead time.

Overhead-Others.png

Thanks,

Srinivasan Ramesh

0 Kudos
Dmitry_P_Intel1
Employee
643 Views

And one more question - do you use collection with stacks like basic hotposts or advanced hotspots with stacks?

If so please choose "user functions + 1" to "user/system fucntions" and redo the screenshot.

User_system.png

BTW with newer compiler version 15.0.3 and higher and new VTune Amplifier XE 2016 Gold you should have much more comprehensive OpenMP analysis I expect.

Thanks & Regards, Dmitry

 

0 Kudos
Srinivasan_R_1
Beginner
643 Views

Hi Dmitry,

Please see the screenshot below with user/system functions option set.

Overhead-others-user-sys.png

Yes, I am indeed using basic hotspot analysis. Also, thanks for the heads up about the new version, I'll consider getting it.

Regards,

Srinivasan Ramesh

0 Kudos
Dmitry_P_Intel1
Employee
643 Views

Hello,

From the screenshot we can see that the overhead is connected with OpenMP stack stitching mechanism that VTune uses for more structured stack representation for OpenMP worker threads. 

You can try advanced-hotspots with stacks like:

amplxe-cl -collect advanced-hotspots -knob collection-detail=stack-sampling ./my_app

or choose the stack knob in GUI configuration for advanced hotspot analysis:

stacks.png

This collector does not have stack stitching so you will be able to avoid the overhead. We are working on the overhead reduction.

Thanks & Regards, Dmitry

0 Kudos
Srinivasan_R_1
Beginner
643 Views

Hi Dmitry,

I tried running advanced hotspot analysis for this application, and it failed, saying "amplxe: Error: Cannot enable Hardware Event-based Sampling: problem with the driver (sep*/sepdrv*)."

So I followed the instructions in the help page for "Building and Managing the Sampling Driver" in Vtune. Please see below for the output of ./insmod-sep3 -q:

Output:

pax driver is loaded and owned by group "srinivasan" with file permissions "775".
sep3_15 driver is loaded and owned by group "srinivasan" with file permissions "775".
vtsspp driver is not loaded.

I tried building the vtsspp driver, by cd'ing to /opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp, and giving ./build-driver as root.

Compilation fails with the following output:

/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_compat_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:232: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_sched_process_exec_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:326: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:327: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘rp_mmap_region_enter’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:451: error: ‘struct pt_regs’ has no member named ‘rdi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:452: error: ‘struct pt_regs’ has no member named ‘rsi’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:453: error: ‘struct pt_regs’ has no member named ‘rdx’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:455: error: ‘struct pt_regs’ has no member named ‘rr8’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:456: error: ‘struct pt_regs’ has no member named ‘rr9’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c: In function ‘probe_sched_process_exec_compat’:
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:771: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:772: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:774: error: ‘struct kretprobe’ has no member named ‘addr’
/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.c:776: error: ‘struct kretprobe’ has no member named ‘addr’
make[2]: *** [/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp/module.o] Error 1
make[1]: *** [_module_/opt/intel/vtune_amplifier_xe_2015.2.0.393444/sepdk/src/vtsspp] Error 2

Any idea what is happening? Appreciate your help.

Regards,

Srinivasan Ramesh

 

0 Kudos
Dmitry_P_Intel1
Employee
643 Views

Hello,

I expect that since your VTune build is pretty old vtss (stack sampling) driver was not yet adapted to the relatively new OS that you might currently have.

I would recommend to upgrade the VTune build to VTune Amplifier XE 2016.

BTW: I found one switch that can be helpful for your basic hotspots collection to avoid OpenMP stack stitching.

It defenitely works for recent builds. Please try on your build:

amplxe-cl -collect hotspots -run-pass-thru=--no-stack-stitching ./my_app

Thanks & Regards, Dmitry

0 Kudos
Peter_W_Intel
Employee
643 Views

1st screen-short: _kmp_join_call() took many time at "imbalanced" - it mean, some threads ran shortly and some ran longer. Need to review timeline for threads' life and adjust algorithm possibly, reduce wait time. 

0 Kudos
Srinivasan_R_1
Beginner
643 Views

Hi Dmitry,

Your suggestion worked! Although I couldn't get advanced hotspots to run, I tried the run-pass-thru command line option, and the Overhead time in "other" category has come down considerably. Please see below:

1. Without the run-pass-thru option:

DefaultOverhead.png

 

2. With the run-pass-thru option:

WithoutStackStitching.png

@Peter: Thanks for pointing it out. I am looking into load balancing for the loop that is contributing to the imbalance.

Kindly consider this thread as closed, and thanks for your help Dmitry.

Thanks,

Srinivasan Ramesh

0 Kudos
Reply