Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5077 Discussions

Huge amount of memory used while processing VTune traces

Etienne
Beginner
5,775 Views

We are planning to use VTune over Chrome to fix performance bottlenecks on Intel. Our first experiments with the tool were great. We are able to use the VTune API to annotate internal Chrome tasks and pin-points performance issues.

 

Unfortunately, we are now reaching the limit of the tool which is using too much memory while processing a ~10 second trace (Chrome startup trace). The tool requires about 2h - 4h to process the traces and is reaching ~200G to 400G of memory usage (see attachment).

 

I ran ETW over the VTune to try to find the bottleneck (see attachment). It seems to me there is three phases.

  1) Tool initialisation (CPU bound)

  2) Followed by reading the traces (Disk bound)

  3) Processing? [or maybe symbolisation] (Memory bound)

 

I would like to know if it's possible to have access to the public symbols to help investigate this issue with large memory consumption? Otherwise, could someone from VTune be able to reproduce?

 

Part of me suspect the memory usage is related to pdb loading and symbolisation within VTune.

0 Kudos
27 Replies
SreedeviK_Intel
Moderator
1,054 Views

Hi,

 

Sorry for the inconvenience caused. 

We were able to reproduce your issue from our end and our internal team is checking on it.

 

As informed before, we would require the output of the following steps so that we could investigate further.

Kindly perform the below steps and share the output with us:

 

1) Open a File Explorer window to "%TEMP%" (C:\Users\sreede2x\AppData\Local\Temp) and delete the amplxe-log-%USER% and amplxe-tmp-%USER% directories.

 

2) Run the failing scenario in VTune.

 

3) Send a .zip of "%TEMP%"\amplxe-log-%USER% to us.

SreedeviK_Intel_0-1702633850133.jpeg

 

 

Regards,

Sreedevi

 

0 Kudos
Etienne
Beginner
1,031 Views

Based on your last request, I did an other run to bring your logs. To be honest, that run  was fine. It was slow but it's always slow to symbolize a trace. It was about as slow as opening/symbolizing a ETW traces.

 

Attached the log file.

 

I saw that symbolisation of chrome.dll was quite slow; maybe expected since it is huge.

Also, please note that the symbols were already downloaded since I run other tools that requires these symbols.

0 Kudos
Etienne
Beginner
1,026 Views

I ran a different analysis. This one took a bit longer but it is still in the expected execution time.

0 Kudos
clevels
Moderator
903 Views

Hello- I have accepted this case and will review the thread and investigate this issue further. Thank you for your patience.


0 Kudos
clevels
Moderator
903 Views

Hello- I will review this issue as well and investigate further. Thank you for your patience.


0 Kudos
clevels
Moderator
796 Views

Hello- thank you for your patience. I have escalated this issue to the development team and provided them the necessary reproducers for this issue. I will provide an update with additional insights as soon as they review this issue.


0 Kudos
clevels
Moderator
699 Views

Hello- thank you for your patience. There has been an update from the development team :


The cause could potentially be that System Overview captures information about every single running process in the system with all their object files contrary to Hotspots that only focuses the target workload, and being fed with a symbol server URL VTune will try to pull and resolve everything it can. That may be thousands of dlls and executables depending on what else is running on the system most of which may have nothing to do with the profiled application.


There is currently no way to explicitly mark executables of interest for resolving in VTune, but please try the following:

  1. Try forming the limited local cache by launching Hotspots with the full symbol server settings either in _NT_SYMBOL_PATH or in the project settings("Search Sources/Binaries" button), VTune respects both, but point it to an empty local symbol cache folder. VTune will pull the pdbs only for the objects related to the target app.
  2. Change the environment/setting to only address the local cache without symbol server URLs
  3. Now launch System Overview and VTune will be only able to pick up symbol information for the same module set that was captured for Hotspots launch. That should significantly limit the scope of symbol resolving and hopefully speed up the finalization.



0 Kudos
Reply