I'm new to VTune, and I usually use AQTime. I'm trying to understand vtune paradigm, and I'm missing a few pieces. In aqtime when I'm looking for hotspots I have a list of user functions which can be sorted by time with children. I can easily find the function with takes most of the time, and then inside it I can see how much each line takes in percent (with children).
VTune on the other hand recommends that I'll look for subs that by themselves take a lot of time (without children). And then inside the function it look in all the file for lines that take a lot of time. So we moved from a function to the whole file, and the lines pointed by vtune don't take into account the children. Thus the hotspots pointed by vtune are virtually useless. I can't see how to find this way the hotspot that actually took time by using a child. Seeing the child by itself is also meaningless. If I would like to see time with children I could use the tree view, and from the first external function go down, more, and more, and more.... until I might find something useful, if there won't be a fork or something. Also it's really inconvenient to compare lines by the actual time. I much prefer in percent of the function and have it for *any* line.
Okay, percent checked, but it's cosmetics. Still the bottom-top gives me irrelavant functions, and when I click the first one, vtune gives me 3 useless hotspots in all the function's file, which doesn't even include the file. I need:
1. List of all user functions sorted by percent with children. I'm not going to lose my self in the tree stack of user/kernel functions. 2. Each line should have a percent with children relative to the whole function.
Without that vtune is useless. It's not an invention of aqtime. Compuserve profiler and other big ones work in the same way. Dear Intel, why reinvent the wheel, and make it a square?
The bottom-up gives you the functions consuming the most time in the application's execution time. I hardly think those are irrelevant. :\
Sorry, but I don't understand this statement, at all, "I'm not going to lose my self in the tree stack of user/kernel functions." :(
Currently, we don't have a display that shows caller/callee information, that is, what percentage of a function's total time is spent in child functions. In the bottom-up display, if you expand the call stack by click on the '+' for a function, you can see the callers and what percentage of time each call path contributed to the overall application time. Basically, this gives you the hot call paths where you should focus your tuning efforts.
You told me that if I'd like to look for functions sorted by time with children I should look in the tree view, instead of the bottom-up view. Well this tree view is actually a view of the stack, including both kernel and user functions, which would make the search quite tiresome.
Viewing the function, which I spend most of my time in not including children, would get me leaf functions such as matrix multiplication, which I can't really optimize. I'm more intersted in one of its ancestors, which calls it in a loop, and by itelf (no children included) costs nothing. Your tip of looking for the ancestor in the tree view inside the bottom-up is more helpful. And I don't get it, how does it show me the cost for the whole hot path, what I see in the bottom up is with or without children? Since according to your doc, it's without.
What about my second question, time for each line? Tell me the truth, do you really use vtune to profile your apps, or do you have in intel some inside more parctical profiler?
Why are you interested in a function that by itself costs nothing but calls the hottest function? If you are looking to parallelize, I don't see what the "cost" of the caller has to do with it. Anyway, you can see the various call paths by expanding the hot function and then click on each one to see the call stack and dbl-click on any function in the stack to examine it to determine if there is a loop you can parallelize. (BTW, have you checked out Parallel Advisor, which basically does all of this for you?)
In order to get "time for each line", the tool would have to instrument each source line and the overhead would be abhorrent. Our tool does not provide that information. Instead, we reduce our overhead and focus on those lines that are significantly contributing to application execution time.
To tell the truth, yes. I'm sorry if the tool does not meet your requirements. We have heard your comments, and others', and are continually working on improvements.