- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dan,
I have few questions regarding vtune. They are as below.
a) Can we get the time taken by a function at a particular instance of time?
For example a function is being called 20 times and I want the time
taken by the function when it iscalled at 4th time.
b)Arethere any products or softwares which uses vtuneto benchmark their software?
For example,In a particular driver, say,there is only one entry-point/fucntion.By using vtune with callgraph approach,I measure the time taken by this entry point to be 'X'. Now Can I publish that my driver takes 'X' amount of time. Ifit is not possible, what are the other things to be considered(example overhead ). Assume that I publish the time with some benchmarked environment or configuration.
c) This question may not be relevent to this group, can someone direct me to the group which has answers tothe below mentioned type of questions ?
What is the best approach to benchmark or publish the time taken by a piece of software/driver during some event?
Thanks
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the typo in my previous post. It is "Dave" not "Dan". Thanks Dave for all your replies.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi skkumar:
Regarding a), yes and no. :-} With a default call graph configuration, you would not get that information. The timing information is aggregated. However, using the Pause/Resume APIs, you could instrument your code to only collect call graph information for the fourth call to your function. It would mean you would not get any other call graph info, but it would give you want you ask for. Since I'm not sure about your code, let me explain that you would need to add code to your code that would Resume() data collection just before the fourth call to the function of interest and then Pause() it after returning from the function. Also, you would start the Activity with data collection "paused" (part of the configuration).
Regarding b), I wouldn't think you would want to use the analyzer for reporting of benchmark numbers. Specifically, sampling reports statistical representations and call graph estimates wait time and its overhead. Instead, you could use something like the rdtsc instruction to get exact numbers within your code. I'll do some checking around and try to find out what others do. You might check the SPEC web site and see what they do? Anyone else have any ideas?
Regards,
Message Edited by DaveA on 11-05-2004 04:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you're looking for a Windows-specific solution, and don't need the resolution of rdtsc, you might prefer the QueryPerformance... APIs. As I said in my deleted reply, I wouldn't call rdtsc exact, although it might be accurate well within 1e-7 second (vs 1e-5 for QueryPerformance..), if care is taken to measure its overhead. I meant to check whether Microsoft implemented support for their proposed __rdtsc intrinsic call in the current SDK, and for which architectures. Without that, rdtsc calls aren't portable between the Intel and Microsoft 64-bit compilers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I verified that the Microsoft way to access rdtsc
unsigned long long __rdtsc(VOID);
is working with the Microsoft CL 14.00.40105, and with a current Intel ICL, W_CCE_PC_8.1.013. Both generate in-line rdtsc code.
This clears up some messy asm code which has been required, at least for the Windows compilers which support this intrinsic.
unsigned long long __rdtsc(VOID);
is working with the Microsoft CL 14.00.40105, and with a current Intel ICL, W_CCE_PC_8.1.013. Both generate in-line rdtsc code.
This clears up some messy asm code which has been required, at least for the Windows compilers which support this intrinsic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The info I got on timing of benchmarks is:
If one is dealing with industry standard benchmark, it should already have a standard way of reporting the numbers. SPEC (www.spec.org) is one good example here. There are other benchmarking bodies out there. If you use one of those benchmarks and misrepresent data, there may be legal implications.
If not, people are usually using something humans can understand such as wall clock time, cycle count (rdtsc), number of processed frames per second, etc. It all depends on what the benchmark is trying to measure.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tim and Dave.
Regarding RDTSC, the accuracy depends on the clock frequency, which can vary due to thermal throttling and the power optimization techniques involved.
Since I am measuring the time taken by the driver during power state changes i.e., for example switch from "Full power state" to "Low power states" such as standby or hibernate, I see lot of variation in the measurements with Rdtsc. KeQueryPerformanceCounter() which seems to be a flavor of QueryPerformanceCounter() for Kernel code, seems to be good, but sometimes I observe high spikes(duetoQPK overhead)inthe time measurements.
Thanks.
Message Edited by DaveA on 11-10-2004 05:37 PM
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page