Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5135 Discussions

Performance Overhead Introduced By Tool Itself

Jinyong_K_
Beginner
817 Views

Hi all,

We are now trying to evaluate this tool for our products.

Our main interest in vtune is wether is can profile apps without any overhead.

The problem with profiling is that generally the profiling code itself ads overhead:

  • Extra cycles are performed for accounting. In tight loops, it is not uncommon at all that the profiling code take more time to process than the code you are profiling. This serious messes up measurements and can make results very confusing
  • The profiling code may also mess up CPU pipelining/branch prediction,  caching, content switches (between threads) and jit. Again, this can skew the profiling result significantly.

We are hoping that maybe vtune can help on some of these issues by using CPU counters more, sometime that hopefully may allow the code to run at full speed without interruption.

Could you give us helpful comments on this concern?

Thanks in advance,

Kim.

0 Kudos
1 Solution
TimP
Honored Contributor III
817 Views

Overhead  at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased.  For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.

View solution in original post

0 Kudos
6 Replies
TimP
Honored Contributor III
818 Views

Overhead  at default sampling rates is low until data collection buffers need to write to disk, so for sampling over several minutes, sample-after values may need to be increased.  For sampling over intervals less than 5 seconds, rates may need to be high enough to see some performance degradation.

0 Kudos
Peter_W_Intel
Employee
817 Views

In general speaking, performance degrading is lower than 2% - if you use default SA value which is 1000 interrupts per second when using CPU Clock ticks. You may NOT use sampling with stack (if you care of performance degrading seriously), it will increase overhead a little bit.

0 Kudos
Bernard
Valued Contributor I
817 Views

@Peter

Do you know if VTune schedules clock interrupts for the execution by specific core?

 

0 Kudos
Peter_W_Intel
Employee
817 Views

@iliyapolak

VTune suspends the system based on hardware PMU events trigger then collect data. ISR (interrupt service routine) returns quickly, so overhead is little. VTune uses clock to interrupt the system, not for specific program's execution. VTune supports all processors claimed in release notes. 

0 Kudos
Bernard
Valued Contributor I
817 Views

Thanks for the answer.

>>>ISR (interrupt service routine) returns quickly, so overhead is little.>>>

On Windows bulk of the ISR processing  is done by registered DPC routine so indeed overhead is low.

0 Kudos
Jinyong_K_
Beginner
817 Views

Hi all,

This is Kim.

Thank you for your insightful explanation on my question.

Even though I chose the first answer as best-answer to this question, (I didn't not know that I could choose only one answer as best :( )

All your answers are best-answers to me. :)

 

Best wishes for the holidays and New Year!

 

0 Kudos
Reply