I ran the vtune analyzer on linux on the examples specified in the help files (gsexample2a). It was able to show the performance data for all the statements at the source level. However, I then chose my own example which was much bigger compared to gsexample2a. For that, I was NOT able to see the performance data for all the statements in the program. I basically want to find out the running times for all the loops in the program - whether they are slow or not. Please tell me how to do this.
Hi, What do you mean when you say you could not see all the performance data for all statements?
Do you mean that you could not see ANY events correlated with lines in your program, or that you saw events only for some lines?
It is normal to see events only for some lines of code in a program. You will not get an event count for every line. Since all processors are pipelined and have multiple execution units, and today's processors often even have multiple cores or hyper-threads, many statements can be executing at once. Vtune Analyzer also only interrupts the processor at a certain interval, and it takes a few micro-seconds for the processor to respond and process the interrupt (which records what code was running at the time). Both of those factors mean that the event counts will not be perfect for every line - often, you will see for a loop that most of the event counts go to a particular line of code with hardly any on the other lines.
Before going into this further, let me know what exactly you are looking for and we can see if there is an issue. Thanks, Shannon Cepeda
I mean the event data was shown for only some lines.
I basically want to measure the running time of a few selected loops in the program. I am characterizing the loops based on a few patterns and I want to know what percentage of the total time for the program is spent in these loops.
want to measure the running times of a few selected loops in a C program so
as to see what percentage of the total time for executing the program is spent in these loops. I should be able to specify the
loops for which the performance should be measured. vtune's focus is on the performance bottlenecks
and it only shows the time for those. Also, vtune does not support time based sampling on linux. Even with event based sampling I cannot select a period of of less than 1ms. So if one loop takes
lesser time than that then its execution time won't be reported.
VTune Analyzer does not report event counts for every line in a program. The way that event-based sampling works is that while your application is running, the processor is periodically interrupted and the statement, function, module, and process information is recorded. These interrupts happen whenever it is time to collect data for a particular event. The interval, called the Sample After Value (SAV), is usually determined by VTune, although you can set it yourself if you want. The idea is that by doing this sampling you get a statistical profile of which parts of your code are resulting in the most events.
This means that you will not get a line-by-line breakdown of your code and what happens for each statement. If you want to measure the running time of one loop against another, theeasiest and most preciseway to do that is to actually time the code. Intel Threading Building Blocks (TBB) has a great thread-aware timer object called tick_count. It is very easy to use and you can try it using TBB for free. You can either download an evaluation copy of TBB or download the open source version (subject to the GPLv2 license with the runtime exception). Instead of the TBB tick_count timer you could also use RDTSCisntruction to read the timestamp counter.
VTune Analyzer is meant to be used as a profiling tool, to allow you to determine which parts of your code are responsible for the most events (time, cache misses, etc). It helps you to find the bottlenecks in your code and to know where to focus on optimizations. Hope this helps!