Community
cancel
Showing results for 
Search instead for 
Did you mean: 
SATISH_K_Intel
Employee
77 Views

Regarding time taken by a peice of code

Hi Dan,
I have few questions regarding vtune. They are as below.
a) Can we get the time taken by a function at a particular instance of time?
For example a function is being called 20 times and I want the time
taken by the function when it iscalled at 4th time.
b)Arethere any products or softwares which uses vtuneto benchmark their software?
For example,In a particular driver, say,there is only one entry-point/fucntion.By using vtune with callgraph approach,I measure the time taken by this entry point to be 'X'. Now Can I publish that my driver takes 'X' amount of time. Ifit is not possible, what are the other things to be considered(example overhead ). Assume that I publish the time with some benchmarked environment or configuration.
c) This question may not be relevent to this group, can someone direct me to the group which has answers tothe below mentioned type of questions ?
What is the best approach to benchmark or publish the time taken by a piece of software/driver during some event?
Thanks
0 Kudos
6 Replies
SATISH_K_Intel
Employee
77 Views

Sorry for the typo in my previous post. It is "Dave" not "Dan". Thanks Dave for all your replies.
Thanks.
David_A_Intel1
Employee
77 Views

Hi skkumar:
Regarding a), yes and no. :-} With a default call graph configuration, you would not get that information. The timing information is aggregated. However, using the Pause/Resume APIs, you could instrument your code to only collect call graph information for the fourth call to your function. It would mean you would not get any other call graph info, but it would give you want you ask for. Since I'm not sure about your code, let me explain that you would need to add code to your code that would Resume() data collection just before the fourth call to the function of interest and then Pause() it after returning from the function. Also, you would start the Activity with data collection "paused" (part of the configuration).
Regarding b), I wouldn't think you would want to use the analyzer for reporting of benchmark numbers. Specifically, sampling reports statistical representations and call graph estimates wait time and its overhead. Instead, you could use something like the rdtsc instruction to get exact numbers within your code. I'll do some checking around and try to find out what others do. You might check the SPEC web site and see what they do? Anyone else have any ideas?
Regards,

Message Edited by DaveA on 11-05-2004 04:55 PM

TimP
Black Belt
77 Views

If you're looking for a Windows-specific solution, and don't need the resolution of rdtsc, you might prefer the QueryPerformance... APIs. As I said in my deleted reply, I wouldn't call rdtsc exact, although it might be accurate well within 1e-7 second (vs 1e-5 for QueryPerformance..), if care is taken to measure its overhead. I meant to check whether Microsoft implemented support for their proposed __rdtsc intrinsic call in the current SDK, and for which architectures. Without that, rdtsc calls aren't portable between the Intel and Microsoft 64-bit compilers.
TimP
Black Belt
77 Views

I verified that the Microsoft way to access rdtsc
unsigned long long __rdtsc(VOID);
is working with the Microsoft CL 14.00.40105, and with a current Intel ICL, W_CCE_PC_8.1.013. Both generate in-line rdtsc code.
This clears up some messy asm code which has been required, at least for the Windows compilers which support this intrinsic.
David_A_Intel1
Employee
77 Views

The info I got on timing of benchmarks is:
If one is dealing with industry standard benchmark, it should already have a standard way of reporting the numbers. SPEC (www.spec.org) is one good example here. There are other benchmarking bodies out there. If you use one of those benchmarks and misrepresent data, there may be legal implications.
If not, people are usually using something humans can understand such as wall clock time, cycle count (rdtsc), number of processed frames per second, etc. It all depends on what the benchmark is trying to measure.
Hope this helps!
SATISH_K_Intel
Employee
77 Views

Thanks Tim and Dave.
Regarding RDTSC, the accuracy depends on the clock frequency, which can vary due to thermal throttling and the power optimization techniques involved.
Since I am measuring the time taken by the driver during power state changes i.e., for example switch from "Full power state" to "Low power states" such as standby or hibernate, I see lot of variation in the measurements with Rdtsc. KeQueryPerformanceCounter() which seems to be a flavor of QueryPerformanceCounter() for Kernel code, seems to be good, but sometimes I observe high spikes(duetoQPK overhead)inthe time measurements.
Thanks.

Message Edited by DaveA on 11-10-2004 05:37 PM

Reply