On a Windows 10 machine, I just observed a significant overhead of VTune 2017 that mislead me during the profiling session. The top hotspot found by VTune in the bottom-up view pointed to LoadLibraryExW. This was called ~30 times by the application to load plugins at runtime. According to VTune it totalled to ~1.5s of CPU time directly within LoadLibraryExW (i.e. self cost).
I double-checked these results against xperf/UiForETW and could not reproduce the cost there. Indeed, wrapping the calls to LoadLibraryExW within simple timers and outputting the data directly showed an interesting result:
When the application runs normally, without VTune, every plugin took ~5ms to load. But when the application is being profiled by VTune, plugin loading suddenly takes ~50ms for every plugin, so roughly a factor of 10x.
Has anyone ever noticed that? Is that a known limitation of VTune? I'm surprised to see such a large influence of VTune, considering that it should not do much except generating an event when a library is loaded...
Any insight would be welcome, thanks.
Do you see the overhead with software type of analysis (like Basic Hotspots, Concurrency, Locks & Waits) or with hardware based collectors ( Advanced Hotspots, etc.) ?