In VTune Performance Analyzer, there was a column "Calls" in the analysis results that showed the number of calls made to a function.
In VTune Amplifier XE 2011, I no longer can find that column in the Top-Down view.
Is the "Calls" functionality no longer supported?
Sorry! The quick answer is "no".
The reason is technology changed. The VTune Amplifier XE 2011 use statistical call graph method - capture the sample first then know caller. This new technology never records Function's entry and exit, so there is no callcountavailable.
The sampling technique is not very useful for analyzing our applications, because most of the time is spent somewhere deep in third-party libraries. Therefore it is quintessential that we have a correct call graph.
Does the roadmap for VTune Amplifier XE include to add the exact tracing feature that already was part of the original VTune version?
This is a new feature for sampling statistical call graph, not a real call graph in VTune Performance Analyzer. The advantage -overhead is less than call graph in VTune Analyzer!
Thank you for your comments. There are many reasons we moved away from a full call graph. My question to you is, what exactly do you want to use the call graph data for?
In other words, was is used mostly for exploring the application architecture? Was it used to analyze call counts and, therefore, detemine which functions to inline? What other application of this feature did you use, regularly?
The performance values reported by the call graph were only an approximation and the overhead of collected the data usually resulted in an 8x performance degradation of the application. With the new statistical call stack sampling, you have a low overhead method of finding the hottest functions and how they were called.
I have this puzzle too. My own purpose is to know how many times the bottolneck function is called. Currently amplifier reports stack/call-chain as a bottleneck and when you look into the function, cpu time on hot spot is far less. So naturally if I could know how many times a function is called, it's easier to know whether it's the function issue or the design/call-chain issue.
there are at least two reasons, why I would prefer an exact call graph:
First, just by looking at the number of calls, it can be decided very easily and quickly whether or not a function is called more often than the developer intended. This can happen, for example, if a time-consuming function is executed inside a loop even though its result is loop-invariant.
Second, sometimes I want to focus my analysis efforts on a particular function which has almost zero self-time, but most time is consumed by its children.This function will not appear in the bottom-up view; therefore I have no chance to locate that function (see my post "How to find a function by name"). With an exact call graph, Amplifier XE could provide a list of all functions of a class and sort them by name.
I would point out that using the Top-down Tree, you can view the callers of a function and see the total time taken by the caller and all its callees. There are some usability issues that we are aware of and are in the process of trying to address, but the information is there.
Thank you for your request. I have no announcement to make about any new functionality.
I can tell you that if 1ms is not a high enough sampling frequency, you can modify the sample after value for the CPU_CLK_UNHALTED events and increase the sampling rate. A warning! Don't make it too low or you will hang your system. Thus, incrementally decrease the sample after value. For example, if the default is 2000000 for your 2 GHz processor, try 1000000, which will double the amount of data collected and increase the sampling rate to .5 ms.
We have found that the 2011 version provides inconsistent results from one run to another, which is clearly attributed to the sampling technique it now uses.
We often do very targetted profiling looking for regressions. It is critical to know how many times a method has been called in order to find out whether or not there is any increase in cpu time or latency. Sampling does not give this, I have no idea if the cpu time refers to 1 sample or 1000 samples.
VTUNE 2011 has a nice comparison tool, but it is only useful if you are confident that you are comparing equivalent profiles.
It is nice that VTUNE 2011 works quicker, but we have found this is not actually an issue when doing detailed profiling as we often do. It doesn't matter much what the msg rates are through the application when profiling at this level.
We are still reliant heavily on using VTUNE 9.1.
Another point on number of calls, is just this information on its own can be very useful. Knowing method B is called 10 times for every invocation of method A is itself very useful in identifying issues in the code.
In my experience, functions with a very high number of calls but a tiny self time are #1 candidate for inlining. Applications with a large code base and many developers tend to hide some of these perls in them.
I wonder if a function that is called a zillion times and a self time of a couple dozens of cycles would show in the current VTune.
Without a full call stack I cannot figure out where is that function is being called from, or how many times.