Time spent in functions ? - Page 2

dsi-intel_d_ · ‎12-11-2014

Hi,

I have an application built with intel compilers (version 15) and -g -O2 mode (optim+debug info).

The application was run through amplxe-cl -collect hotspots.

When lauchin VTune on the collected info, I would expect to find (as explained in a tutorial video) the time spent in the more time consuming function. Instead, I get:

Which is not really helpful.

Also, VTune seems a little bit confused with the concept of function vs library (see image below).

Is there an alternative to VTune do profile code compiled with Intel ? I just need the usual information, time spend in functions, loops, cache misses etc...

Regards

Dmitry_P_Intel1 · ‎12-15-2014

Hello Alain,

So have you been able to finally get useful information on performance hotspots after the latest manupulations?

Regards, Dmitry

dsi-intel_d_ · ‎12-16-2014

Well,it's getting better, I see functions now :). I wouldn't call them useful information so far, in the sense that I cannot do much out of them though.
I'm probably missing something. From the info I get, the master process (did not look at the others, but the master is probably the first one to look into in this case) we are spending _kmp_fork_barrier, but I cannot go back to the place in our code that triggers eventually those calls (probably just an oversight, but I couldn't find the number of calls either).
But I probably just need to find the right place in the doc.

dsi-intel_d_ · ‎12-16-2014

Also, most of the time is spent outside of main, I suppose it is due to the fact that at some point, main ends up calling a multi threaded (through openmp) solver and that time spent in the 7 other thread is not accounted on the function spawning the thread ?

Dmitry_P_Intel1 · ‎12-17-2014

Hello Alain,

I would recommed to look at application with OpenMP with he help of "/OpenMP Regions/..." Bottom-Up pane grid grouping. If you use VTune Amplifier XE 2015 Update 1 it will help you to look at OpenMP inefficiency classification and potential gain (wall time) that you can have investing in fighting with this inefficiences. You can find details in: https://software.intel.com/en-us/node/529832.

BTW - what Intel compiler version do you use?

Thanks & Regards, Dmitry

dsi-intel_d_ · ‎12-17-2014

The OpenMP Region grouping is not more helpfull. Time is spent in launching threads, but I cannot find a way to connect that to our code.

I am using ifort version 15.0.0

Thanks