I have just bought VTune. The firt thing I have done is to run a "sampling" session to see where the time is spent. I have then run a "call graph session" to see what is calling what and where the inclusive and exclusive time is spent. By they are giving me different timimgs, i.e. sampling gives me an overall run time of the exe of 175 seconds (which compares well with a "stop watch" timing) whereas the call graph gives me a exectution time of 244 seconds :smileysurprised:. What am I doing wrong? All of my settings are the default ones (when the configuration wizard runs)
Call graph inserts additional code in your application. It would not be unusual to see a run time increase such as you quote. So, you pay in overhead for the additional features in VTune, in comparison to other call graph tools. Sampling sessions, on the other hand, with normal or default sampling rates, have little effect on the execution of your program, so can be fairly accurate for performance measurement.
Is there any options which I can change to reduce the difference? does the sampling interval has any effect, or in other words, how can I get the execution speed of the call graph close to the execution speed of the sampling
No. Sampling does not keep track of calling relationship and has no way to that. It simply samples the execution context (EIP, process ID, thread ID) and returns. This allows it to have very low overhead.
To reduce the impact of call graph on execution time of your application, you can configure the activity to not instrument DLLs that you don't care about (see instrumentation level to "Minimal"). However, if means you will not see any calls within those DLLs.
The other point of call graph is that you should not try to compare the time in call graph to the time in sampling. Rather, use the timings in call graph to compare relative to other functions. That is, you can see which child functions contributed what percentage of time to the calling function (see the Call List tab under the graph).
One way of using the VTune analyzer is to use sampling to identify a hotspot function and then use call graph to identify what functions called the hotspot function.