Measure exactly one function

mestersiraly · ‎10-22-2007

Hello!
1.part:
I want to measure the clockticks for a function in a module. My problem is that I can't explain it to vtl.
The best I've achieved is with the command:

vtl view -hf -mn MyModule.exe

This one I give to my Celeron 1700 MHz processor running Windows XP and VTune 9 after the command(s):

vtl activity -c sampling -o
"-ec en='Clockticks':sa=1700000" -app MyModule.exe run

What should I do, if just want to see a function (it's name) and just the Clockticks for it in a table (mean no module name, segment, module path... information)?
--------------------------------------------------------------------------------
2. part:
Actually what I really want is measure the 2nd level cache read misses for a specific function, but with the command

vtl activity -c sampling -o "-ec en='2nd Level Cache Read Misses':sa=10000" -app MyModule.exe run

I got a message Currently the activity is set to have a duration of zero...
... and with the -d 50 flag set:

vtl activity -c sampling -o "-ec en='2nd Level Cache Read Misses':sa=10000" -d 50 -app MyModule.exe run

The Sampling Collector failed to collect data because the SAV is too low.

So my question is: How could I measure the number of 2nd level cache read misses from command line for exactly one function and see exactly the function name and the number printed out?
I would be very grateful if someone could answer the questions.

David_A_Intel1 · ‎12-06-2007

The sampling collector does not do any filtering during data collection, that would significantly impact performance. Instead, you filter the data during analysis to look at one function.

So, use this command line to collect the data:

vtl activity -c sampling -o "-ec en='2nd Level Cache Read Misses'" -d 50 -app MyModule.exe run

Then, tell vtl to output the data by function for your module, using the -hf switch:

vtl view -hf -mn MyModule.exe

Intel_C_Intel · ‎12-29-2007

Is it possible, to see aggregated results for a function?

Let's suppose I am measuring clockticks, or whatever, and I've got a very simple application with a main() fuction and a func1() fuction, that is called by main(). If I set the sample after values low enough, func1() will show up among the results. But what if I would like to get the aggregated result for main() including all the called subroutines and functions, this case, included the clocktick of func1() and everything func1() called?

So basically, how could I measure, or get the aggregated results by function? The hotspot view (in GUI mode) does not include the called functions. I could manually add up the results in this case, but what if it gets complicated?

(If I check the result of a measured event in module view, I can get the exact same number by adding up all the results in the hotspot view of that module. This is the proof, that the functions does not contain the event samples of those lines, that are separated in a different function and a sample was recorded there.)

The only solution that I see now is to separate all functions I want to measure to separate applications (exe), because them I can measure separately.

Could someone help me out on this one, please. Thanks.

jimdempseyatthecove · ‎01-02-2008

The generalized solution (aggregated result for main()) is quite different than for a specific case as in your example. While I cannot attest to if VTune has a feature to perform this feat I can describe what would be require and then you can dig around in VTune documentation as well as other profiler documentation.

What is needed (by you) is an option thatat sample time or sample event is to look at the call tree (requires preservation of stack frame), and then bill all subroutines in the call tree, at determined source line, one tick or whatever the billing value is. Main would always accumulate 100%, but other paths out of main would show different accumulations.

This would be one step closer to what you want.

A shortcomming of the above is it doesn't specifically disclose the specific call tree overhead by branch. To perform this feat, the sampler code would, at event, examine the call tree as before, and in addition to ticking up the call tree, the sampling routine would construct a symbolic name base on the call tree (e.g. Main:Fee:Fi:Fo:Fum) and lookup (create if necessary) and then tick the counter(s) appropriately. With this information you can find the branch paths that exhibit the types of overhead you are interested in.

It should be noted that running in this mode would likely produce large interferance with cache analysis. But it might provide better insight as to where to address your optimization efforts.

Jim Dempsey

TimP · ‎01-02-2008

Jim appears to be describing gprof call graph (which doesn't work with Intel Windows compilers). I started to make a similar answer, but when I read the original post, it seemed to mention 3 conflicting sets of requirements.
I would go so far as to switch compilers temporarily in order to use gprof, when I'm interested in that information.
If I read only part of the post, I would have thought that the overall summary for the .exe module in VTune event based collection might answer the request, or that VTune call graph results might be interesting.