Re: Resolving Symbols Parallelism

JoeH · ‎01-16-2024

I saw this suggestion on the boards: https://community.intel.com/t5/Analyzers/Configure-VTune-parallelism/m-p/1381370#M22119

I experience the same thing.

We have very big binaries with huge debug info sections (for example, 1.7Gb binary with debug info ends up being 0.2Gb after stripping debug info).

It takes between 15 to 30 minutes for VTune to resolve symbols (fast mode) and it is using only two cores on a 56 core machine.

This makes profiling iterations using VTune very slow.

The other thread mentioned that you are considering improving this.

Is there a way now to increase the post processing parallelism?

If not, can you provide an update on this feature request status?

Thank you!

Jennifer_D_Intel · ‎01-21-2024

The development team is still looking into ways to speed up post-processing, and I don't see any updates to the specific request to add more CPU utilization to the finalization process. Other suggestions include having more control over exactly which symbol files are used, particularly on Windows when accessing the symbols may trigger a cache update. But that would not solve the problem of slow resolution for necessary large files.

One consideration with VTune's use of resources is that it doesn't interfere with other workloads that are currently running. In some cases VTune is run on production systems and must keep overhead low. We've recommended deferring finalization to other systems in those cases or when there is a faster system than the target with access to the symbol/source files. I'll talk with the development team and see if they can provide more information as to why VTune uses limited cores and if/when we can expect an update to this request.