Here is the excerpt from the "VTune affects behavior of API function AllocConsole?" thread. ------------------------------------------------------ Daniel,
Yes, by dragging and dropping I can see clockticks data in source view now. But I have another problem. I don't know if I have configured the sampling activity right, because I found some functions have nothing displayed in their "clockticks" columns of the source view. But I'm sure the function is called in the program and I can step into it in Visual C++ debugger. Besides, some parts of other functions also have their corresponding clockticks fields in the source view left blank. And I'm sure these parts of codes (and the corresponding disassembled codes) are executed.
My question is: 1. What's the precise meaning of the term "clocktick"? 2. Is my problem of blank "clockticks" field caused by mis-configuration of the sampling activity? If so, how should I configure it properly?
I agree with dbricker, that it is better to open a new thread for Sampling. But, in any case, I'll answer you here.
Sampling is a VERY complicated profiler. It is impossible to describe all it abilities in a short post, so I'll describe only the very basics.
In order to be able to read sampling results, you need to understand, how sampling works. First of all - sampling is a statistical profiler, which means that it results are meaningful only for the most frequently executed code. Each Intel processor has at least one special register - event counter. This counter is configurable - you can specify to the hardware what specific hardware event you want to count and the maximum count value. When processor counts maximum number of event, it generates hardware interrupt, which is handled by VTune driver. VTune driver notifies active process, active thread and next (for Pentium, current for Itanium) instruction to be executed. All this data is written into VTune database, post-processed and presented later to you.
You need to understand 2 basic issues: 1. Counters are handled by hardware globally for all processes and even OS itself. 2. Interrupts are generated only after specified number of events occur. This means, that all this number of events are associated with one single instruction, that generated only the last event.
Now let's try to understand, what does this mean for clockticks event. Clockticks event counts time. Most of this events will be associated with the most frequently executed parts of your code. If some instruction caught maximum clockticks, this means that either this instruction itself or other instructions around it are very heavy or executed many times. So, optimizing this single part of code will give you maximal possible performance boost (of course, most times modifying algorithm may give you much higher boost, but we are speaking now about local optimizations only).
What is the most fundamental problem of the sampling ? Sampling gives you only flat results. Ex: let's assume, you have put "printf(....)" in for() loop that executed 10000000 times. This is your "main". Sampling will point to some code inside printf() function itself or, even, inside disk driver. Does this mean, that the problem is in the "C" library of OS ? NO - the problem is in your code - if you will move printf
() out of the loop, the problem will disappear. Here is where Call Graph comes - with much more overhead and less accuracy in time values themselves you will get much more data - you will get the control flow graph with call counts.
By the way, you can drag-and-drop data from both Call Graph and Sampling activity results into the Source View window. Each drop will add one or more data column. You can even drag-and-drop Call Graph results into Sampling tables and vise versa - this will allow you to view all relevant data in the same table.