gprof (option -pg) does actually insert a timer function call which times from the beginning of a function to the return, but part of the function entry time either gets allocated to the calling function or not allocated at all. VTune call graph option does something similar, but adds significant overhead.
Thanks once again !
still one part is not clear to me. the function calls as u mentioned are monitored by intercepting timer interrupts which are made during the function call. are these interrupt calls made just when the call is made or after the function call stack has been created (with the pass-by-value parameters having been copied to the called function's parameter variables)
I just hope that i am not confusing in my post here...
thanks for all the replies so far, and a comment on this will be highly appreciated.
I think it is confusing because the VTune analyer displays something close to what you expected.
While the data is displayed for some source lines, it does not mean we collected detailed data for that source line. Rather, the VTune analyzer performs statistical sampling = periodically interrupts processor and collects EIP, Process ID, and Thread ID. This gives you a representative view of what your code is doing, i.e., where it is spending significant time. It does not count cycles for each instruction. When you view the source, the samples collected on instructions within a source line are aggregated and displayed for that source line. What that tells you is that, in general, this source line represents x% of all the time spent executing your application. You should focus your optimization efforts on the lines with a significant amount of time, as opposed to optimizing code that didn't have any, or had few, samples collected on it.
The VTune analyzer's crowning feature is the ability to control the period that it collects samples on by using the processor's performance monitoring events. So, for example, you could collect samples based on 2nd-level Cache Read Misses to highlight code that may be suffering performance problems due to cache issues.
Hope that helps.