find the hot spot. I use
sampling to locate the hot spot.
At first, the functions lists according to its used time
But after I have written assembly source for A, and run VTune again,
but the result is confusing
XScale hardware provides several performance counter.
Each counter can trigger interrupts, and VTune do the time statistics
Using sampling, VTune interrupts the program at interval.
For example, there are two functions in a souce file:
-------------- trigger a interrupt
--------------- trigger a interrupt
How does VTune do?
So, in your example, the interrupt in function A() results in a sample being recorded in function A(). Likewise, the interrupt in function B() results in a sample being recorded in function B(). Sampling presents a "statistical" representation of relative active-ness of various parts of the code.
Thus, what your results confirmed was that your assembly version resulted in fewer samples occurring in that function than when it was written in the high-level language (assuming the order is sorted from highest to lowest and that it was a high-level language, initially :-).