Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
6 Views

Performance Overhead Introduced By Tool Itself

Jump to solution

Hi all,

We are now trying to evaluate this tool for our products.

Our main interest in vtune is wether is can profile apps without any overhead.

The problem with profiling is that generally the profiling code itself ads overhead:

  • Extra cycles are performed for accounting. In tight loops, it is not uncommon at all that the profiling code take more time to process than the code you are profiling. This serious messes up measurements and can make results very confusing
  • The profiling code may also mess up CPU pipelining/branch prediction,  caching, content switches (between threads) and jit. Again, this can skew the profiling result significantly.

We are hoping that maybe vtune can help on some of these issues by using CPU counters more, sometime that hopefully may allow the code to run at full speed without interruption.

Could you give us helpful comments on this concern?

Thanks in advance,

Kim.

0 Kudos

Accepted Solutions
Highlighted
Black Belt
6 Views

Overhead  at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased.  For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.

View solution in original post

0 Kudos
6 Replies
Highlighted
Black Belt
7 Views

Overhead  at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased.  For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.

View solution in original post

0 Kudos
Highlighted
Employee
6 Views

In general speaking, performance degrading is lower than 2% - if you use default SA value which is 1000 interrupts per second when using CPU Clock ticks. You may NOT use sampling with stack (if you care of performance degrading seriously), it will increase overhead a little bit.

0 Kudos
Highlighted
Black Belt
6 Views

@Peter

Do you know if VTune schedules clock interrupts for the execution by specific core?

 

0 Kudos
Highlighted
Employee
6 Views

@iliyapolak

VTune suspends the system based on hardware PMU events trigger then collect data. ISR (interrupt service routine) returns quickly, so overhead is little. VTune uses clock to interrupt the system, not for specific program's execution. VTune supports all processors claimed in release notes. 

0 Kudos
Highlighted
Black Belt
6 Views

Thanks for the answer.

>>>ISR (interrupt service routine) returns quickly, so overhead is little.>>>

On Windows bulk of the ISR processing  is done by registered DPC routine so indeed overhead is low.

0 Kudos
Highlighted
Beginner
6 Views

Hi all,

This is Kim.

Thank you for your insightful explanation on my question.

Even though I chose the first answer as best-answer to this question, (I didn't not know that I could choose only one answer as best :( )

All your answers are best-answers to me. :)

 

Best wishes for the holidays and New Year!

 

0 Kudos