Is it possible, to see aggregated results for a function?
Let's suppose I am measuring clockticks, or whatever, and I've got a very simple application with a main() fuction and a func1() fuction, that is called by main(). If I set the sample after values low enough, func1() will show up among the results. But what if I would like to get the aggregated result for main() including all the called subroutines and functions, this case, included the clocktick of func1() and everything func1() called?
So basically, how could I measure, or get the aggregated results by function? The hotspot view (in GUI mode) does not include the called functions. I could manually add up the results in this case, but what if it gets complicated?
(If I check the result of a measured event in module view, I can get the exact same number by adding up all the results in the hotspot view of that module. This is the proof, that the functions does not contain the event samples of those lines, that are separated in a different function and a sample was recorded there.)
The only solution that I see now is to separate all functions I want to measure to separate applications (exe), because them I can measure separately.
Could someone help me out on this one, please. Thanks.
The generalized solution (aggregated result for main()) is quite different than for a specific case as in your example. While I cannot attest to if VTune has a feature to perform this feat I can describe what would be require and then you can dig around in VTune documentation as well as other profiler documentation.
What is needed (by you) is an option thatat sample time or sample event is to look at the call tree (requires preservation of stack frame), and then bill all subroutines in the call tree, at determined source line, one tick or whatever the billing value is. Main would always accumulate 100%, but other paths out of main would show different accumulations.
This would be one step closer to what you want.
A shortcomming of the above is it doesn't specifically disclose the specific call tree overhead by branch. To perform this feat, the sampler code would, at event, examine the call tree as before, and in addition to ticking up the call tree, the sampling routine would construct a symbolic name base on the call tree (e.g. Main:Fee:Fi:Fo:Fum) and lookup (create if necessary) and then tick the counter(s) appropriately. With this information you can find the branch paths that exhibit the types of overhead you are interested in.
It should be noted that running in this mode would likely produce large interferance with cache analysis. But it might provide better insight as to where to address your optimization efforts.