- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
We are now trying to evaluate this tool for our products.
Our main interest in vtune is wether is can profile apps without any overhead.
The problem with profiling is that generally the profiling code itself ads overhead:
- Extra cycles are performed for accounting. In tight loops, it is not uncommon at all that the profiling code take more time to process than the code you are profiling. This serious messes up measurements and can make results very confusing
- The profiling code may also mess up CPU pipelining/branch prediction, caching, content switches (between threads) and jit. Again, this can skew the profiling result significantly.
We are hoping that maybe vtune can help on some of these issues by using CPU counters more, sometime that hopefully may allow the code to run at full speed without interruption.
Could you give us helpful comments on this concern?
Thanks in advance,
Kim.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Overhead at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased. For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Overhead at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased. For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general speaking, performance degrading is lower than 2% - if you use default SA value which is 1000 interrupts per second when using CPU Clock ticks. You may NOT use sampling with stack (if you care of performance degrading seriously), it will increase overhead a little bit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Peter
Do you know if VTune schedules clock interrupts for the execution by specific core?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@iliyapolak
VTune suspends the system based on hardware PMU events trigger then collect data. ISR (interrupt service routine) returns quickly, so overhead is little. VTune uses clock to interrupt the system, not for specific program's execution. VTune supports all processors claimed in release notes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the answer.
>>>ISR (interrupt service routine) returns quickly, so overhead is little.>>>
On Windows bulk of the ISR processing is done by registered DPC routine so indeed overhead is low.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
This is Kim.
Thank you for your insightful explanation on my question.
Even though I chose the first answer as best-answer to this question, (I didn't not know that I could choose only one answer as best :( )
All your answers are best-answers to me. :)
Best wishes for the holidays and New Year!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page