Using VTune for absolute performance #'s

I'm trying to compare two implementations of a particular function for their performance in terms of cpu time and floating-point instructions-retired. I'd prefer not to use any kind of stochastic sampling, I just want to know how many cycles and how many flops elapsed between point A and point B in my code, where this fragment will be executed many times in a single program run.

Unless I'm mis-reading everything, VTune's sampling is stochastic, either time-based or event-based. Is there a way to make VTune's sampling _exhaustive_, so I get the total # of instructions/flops in a function?

I am including VTuneApi calls at the beginning and end of the function to resume and pause data collection.

Really I'm looking for something very much like PAPI (, which doesn't support Windows/P4 machines. I'm hoping VTune can deliver this functionality.



Dan Morris
