"Calls" no longer available in Amplifier XE?

fmunkert · ‎03-10-2011

In VTune Performance Analyzer, there was a column "Calls" in the analysis results that showed the number of calls made to a function.

In VTune Amplifier XE 2011, I no longer can find that column in the Top-Down view.

Is the "Calls" functionality no longer supported?

Regards

- Frank

Peter_W_Intel · ‎03-11-2011

Hi Frank,

Sorry! The quick answer is "no".

The reason is technology changed. The VTune Amplifier XE 2011 use statistical call graph method - capture the sample first then know caller. This new technology never records Function's entry and exit, so there is no callcountavailable.

Regards, Peter

fmunkert · ‎03-13-2011

If the new technology does not record entry and exit of a function, how can you ever construct an exact call graph? If you do this by sampling call stacks, you only geta statisticalapproximation of the call graph, right?

The sampling technique is not very useful for analyzing our applications, because most of the time is spent somewhere deep in third-party libraries. Therefore it is quintessential that we have a correct call graph.

Does the roadmap for VTune Amplifier XE include to add the exact tracing feature that already was part of the original VTune version?

Regards
- Frank

Peter_W_Intel · ‎03-13-2011

You are right!

This is a new feature for sampling statistical call graph, not a real call graph in VTune Performance Analyzer. The advantage -overhead is less than call graph in VTune Analyzer!

David_A_Intel1 · ‎03-14-2011

Hi Frank:

Thank you for your comments. There are many reasons we moved away from a full call graph. My question to you is, what exactly do you want to use the call graph data for?

In other words, was is used mostly for exploring the application architecture? Was it used to analyze call counts and, therefore, detemine which functions to inline? What other application of this feature did you use, regularly?

The performance values reported by the call graph were only an approximation and the overhead of collected the data usually resulted in an 8x performance degradation of the application. With the new statistical call stack sampling, you have a low overhead method of finding the hottest functions and how they were called.

Thanks, again.

nicolas_wang · ‎03-14-2011

Hi Anderson and others,
I have this puzzle too. My own purpose is to know how many times the bottolneck function is called. Currently amplifier reports stack/call-chain as a bottleneck and when you look into the function, cpu time on hot spot is far less. So naturally if I could know how many times a function is called, it's easier to know whether it's the function issue or the design/call-chain issue.

Thanks,
Nicolas

fmunkert · ‎03-14-2011

Hi MrAnderson,

there are at least two reasons, why I would prefer an exact call graph:

First, just by looking at the number of calls, it can be decided very easily and quickly whether or not a function is called more often than the developer intended. This can happen, for example, if a time-consuming function is executed inside a loop even though its result is loop-invariant.

Second, sometimes I want to focus my analysis efforts on a particular function which has almost zero self-time, but most time is consumed by its children.This function will not appear in the bottom-up view; therefore I have no chance to locate that function (see my post "How to find a function by name"). With an exact call graph, Amplifier XE could provide a list of all functions of a class and sort them by name.

Regards
- Frank

David_A_Intel1 · ‎03-16-2011

Thank you, Frank. I understand and this is useful feedback to us.

I would point out that using the Top-down Tree, you can view the callers of a function and see the total time taken by the caller and all its callees. There are some usability issues that we are aware of and are in the process of trying to address, but the information is there.

GrCorrea · ‎04-14-2011

MrAnderson,

Is it possible to get theTop-down Tree through command-line execution?

Thanks,

GrCorrea.

David_A_Intel1 · ‎04-14-2011

No, I'm sorry. At present, there is no way to display call stack information from the command line. We've heard this request from several users and are investigating the feasibility of adding that feature.

eric_openshaw · ‎06-13-2011

Hi MrAnderson - I just found this thread with exactly the same complaint. Is there any news on whether this will be reintroduced any time soon?

I appreciate that the latest user mode sampling is far less invasive and in some cases does provide quicker targeting of problem areas - it is a useful addition to the product. However, in previous versions of vtune it was immensely helpful to see every call (count, time and self time) in my application. My particular application is heavily optimised for speed and I get the feeling that the statistical approach of the hostspot analysis is not invasive enough! I often see very different results from mutliple runs with the application doing exactly the same workload.

I'm certainly not running this on a live system, but high frequency sampling is actually very helpful sometimes and i can't see a way can do this anymore. The highest resolution for user mode sampling is 1ms correct? Sorry to say but this is a step backwards in my opinion.

Regards,

Eric

David_A_Intel1 · ‎06-13-2011

Hi Eric:

Thank you for your request. I have no announcement to make about any new functionality.

I can tell you that if 1ms is not a high enough sampling frequency, you can modify the sample after value for the CPU_CLK_UNHALTED events and increase the sampling rate. A warning! Don't make it too low or you will hang your system. Thus, incrementally decrease the sample after value. For example, if the default is 2000000 for your 2 GHz processor, try 1000000, which will double the amount of data collected and increase the sampling rate to .5 ms.

tschenck · ‎09-16-2011

Exact call graphs is the single biggest feature I need for this tool. Without this functionality, all performance is just guesses based on random sampling. There is overhead with this, but the overhead is knowable and can be accounted for.

As was stated before, the very first thing a high-performance programmer is looking for is how well the predicted call graph matches what really happens in the application.

The closest you can get with the new program is to sample on each branch. It's approximately the same, but not quite becaue the overhead is not known.

George_Geczy · ‎10-18-2011

Please add my two cents as wanting the call graph feature returned in some form - for our own development over the years, the call count has been the #1 data we've obtained from our profiling. It's allowed us to focus on strategic inlining of code (significant performance improvements), and also helped to to attack situations where code was being called much more often than expected (either because of poor loop design, or poor algorithm design).

Even with high performance timing, the sampling just seems to get us a "somewhere in the area of" type of result most of the time, and makes it harder to isolate the exact causes.

eric_openshaw · ‎10-18-2011

fwiw - we're now using GlowCode - best profiler I've ever used. it has all the stuff the old VTune did (that i was interested in) but much easier to configure AND shows live data as the process is running AND shows mem allocation for each funtion.

dgorman · ‎10-21-2011

This thread clearly explains exactly the same issues we are having with VTUNE Amplifier XE 2011 vs the old VTUNE 9.1.

We have found that the 2011 version provides inconsistent results from one run to another, which is clearly attributed to the sampling technique it now uses.

We often do very targetted profiling looking for regressions. It is critical to know how many times a method has been called in order to find out whether or not there is any increase in cpu time or latency. Sampling does not give this, I have no idea if the cpu time refers to 1 sample or 1000 samples.

VTUNE 2011 has a nice comparison tool, but it is only useful if you are confident that you are comparing equivalent profiles.

It is nice that VTUNE 2011 works quicker, but we have found this is not actually an issue when doing detailed profiling as we often do. It doesn't matter much what the msg rates are through the application when profiling at this level.

We are still reliant heavily on using VTUNE 9.1.

Another point on number of calls, is just this information on its own can be very useful. Knowing method B is called 10 times for every invocation of method A is itself very useful in identifying issues in the code.

darietti7 · ‎01-18-2012

+1 for a more intrusive but exact call-graph

In my experience, functions with a very high number of calls but a tiny self time are #1 candidate for inlining. Applications with a large code base and many developers tend to hide some of these perls in them.

I wonder if a function that is called a zillion times and a self time of a couple dozens of cycles would show in the current VTune.

darietti7 · ‎01-23-2012

For instance, right now I'm profiling an application and lightweight hotspots shows that modf from msvcr90.dll is at the top, but in the hotspots analysis it doesn't even show up?

Without a full call stack I cannot figure out where is that function is being called from, or how many times.

Mark_D_Intel · ‎01-23-2012

By default in the hotspots analysis, time is assigned to the last user function before system functions are called. You can adjust this in the GUI with the "Call Stack Mode" dropdown box, and in the command line with the -call-stack-mode option. This may account for the discrepancy you see between the hotspots/lightweight hotspots results.