- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are planning to use VTune over Chrome to fix performance bottlenecks on Intel. Our first experiments with the tool were great. We are able to use the VTune API to annotate internal Chrome tasks and pin-points performance issues.
Unfortunately, we are now reaching the limit of the tool which is using too much memory while processing a ~10 second trace (Chrome startup trace). The tool requires about 2h - 4h to process the traces and is reaching ~200G to 400G of memory usage (see attachment).
I ran ETW over the VTune to try to find the bottleneck (see attachment). It seems to me there is three phases.
1) Tool initialisation (CPU bound)
2) Followed by reading the traces (Disk bound)
3) Processing? [or maybe symbolisation] (Memory bound)
I would like to know if it's possible to have access to the public symbols to help investigate this issue with large memory consumption? Otherwise, could someone from VTune be able to reproduce?
Part of me suspect the memory usage is related to pdb loading and symbolisation within VTune.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
Can you get back to us on the following information so that we can try reproducing it from our side:
1. CPU, Processor and OS details
2. Sample reproducer code along with the steps/commands
3. VTune version along with type of analysis performed
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CPU, Processor and OS details
==================================================
Name: Intel(R) Xeon(R) Processor code named Skylake
Frequency: 3.0 GHz
Logical CPU Count: 72
2x Processor Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, 2993 Mhz, 18 Core(s), 18 Logical Processor(s)
Installed Physical Memory (RAM) 192 GB
OS Name Microsoft Windows 10 Enterprise
Version 10.0.19045 Build 19045
2. Sample reproducer code along with the steps/commands
==================================================
Analysis [applies to all of them] but here are two examples:
1) Hotspot
"C:\Program Files (x86)\Intel\oneAPI\vtune\latest\bin64\vtune" -collect hotspots -no-follow-child "--app-working-dir=C:\Users\etienneb\AppData\Local\Google\Chrome SxS\Application" -- "C:\Users\etienneb\AppData\Local\Google\Chrome SxS\Application\chrome.exe" --user-data-dir=c:\src\dummy --no-sandbox
2) System Overview
"C:\Program Files (x86)\Intel\oneAPI\vtune\latest\bin64\vtune" -collect system-overview -knob analyze-power-usage=true -knob analyze-throttling-reasons=true -no-follow-child "--app-working-dir=C:\Users\etienneb\AppData\Local\Google\Chrome SxS\Application" -- "C:\Users\etienneb\AppData\Local\Google\Chrome SxS\Application\chrome.exe" --user-data-dir=c:\src\dummy --no-sandbox
STEPS:
We launched the collection through the UI and when Chrome finished to load the first page, we stopped the collection through the UI.
3. VTune version along with type of analysis performed
==================================================
VTune Profiler 2023.2.0
626047
C:\Program Files (x86)\Intel\oneAPI\vtune\latest
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are checking on this internally and will get back to you with an update shortly.
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you please confirm if you are using hardware event-based sampling drivers? If not, could you try as mentioned below and share your result directory.
In GUI, check the hardware event-based sampling checkbox.
For example, in GUI:
In CLI, try adding "-knob sampling-mode=hw" in the command .
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We did try both options (hardware vs user-mode).
We also reduced the sampling interval to collect less data.
We also try to collect a really small trace (1 second).
We did get trouble loading results from other analysis too. This is why we really suspect it is related to the symbolisation phase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just made an other try. I attached the generated vtune files for a quick trace.
I hope the output file is self-contained.
The trace is a quick chrome startup. All the child processes are traced.
VTune is launched with admin rights. The hotspot analysis is used with hardware sampling (5 ms interval).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On a brand new Intel CPU laptop, 4x cores / 16G ram, windows freshly installed, I did install vtune and took a trace of chrome startup.
The hotspot analysis was completed in less than 1 minutes. Unfortunately, the stackframes were not visble and the symbols were not loaded.
I added the environment path for symbols and re-run the same test. After an hour, the finalization phase is still running.
I am using these symbols servers:
_NT_SYMBOL_PATH=SRV*C:\src\symbols*https://msdl.microsoft.com/download/symbols;SRV*C:\src\symbols*https://chromium-browser-symsrv.commondatastorage.googleapis.com;SRV*C:\src\symbols\*https://download.amd.com/dir/bin;SRV*C:\src\symbols*https://driver-symbols.nvidia.com;SRV*C:\src\symbols\*https://software.intel.com/sites/downloads/symbols/
I highly suspect the performance issues are related to symbols loading. I won't be surprised that chrome.dll.pdb is just too big to be easilly processed by vtune.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are checking on this internally and will get back to you with an update shortly.
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Sorry for the delay in getting back to you.
We had checked with our development team and they informed that they are working on this fix and is targeted to fix in the future releases.
Also, I could see that you don't have priority support. But, Our dev team would like to know whether any of your team members have priority support. With priority support, you could easily access builds earlier than the targeted release.
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please provide us an update?
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what you expect as an update.
We were trying to use VTune and see the potential optimisations that can be detected with that low level tools.
Unfortunately, it doesn't work as-is on the code base. We spent time to investigate the source of the issue and came to the conclusion that the limitations are in the tools and we can move forward on tooling analysis. Until the fixes are available, we can't investigate the usefulness of VTune over our code base.
> Our dev team would like to know whether any of your team members have priority support.
I don't know what is 'priority support' service. It is our first use of the tool, so I doubt we do have it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you please try running your sample on updated VTune version (2024.0) and please let us know if the issue still persists?
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried on one of my dev computer and it went smooth. The Hotspot analysis was able to symbolize the trace quickly.
I tried on my laptop and I got the following error in my log. I highly suspect this is related to one of the security software installed on my laptop (corp policy). I don't know why it was working fine before; is it a vtune regression?
'''
11/24/2023 11:37:16:705 : 14368 : ERROR : Installation of component has failed.
Component id: intel.oneapi.win.oneapi-common.licensing, name: oneAPI Common, version: 2024.0.0+49430.
During the execution of the application 'C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe' with arguments '-NoLogo -NoProfile -NonInteractive -ExecutionPolicy AllSigned -File C:\ProgramData\Intel\InstallerCache\DownloadCache\intel.oneapi.win.oneapi-common.licensing,v=2024.0.0+49430\latest_symlink_post_install.ps1 -installDir C:\Program Files (x86)\Intel\oneAPI -linkTargetVersion 2024.0 -latestLinkDir licensing' errors were received:
C:\ProgramData\Intel\InstallerCache\DownloadCache\intel.oneapi.win.oneapi-common.licensing,v=2024.0.0+49430\latest_syml
ink_post_install.ps1 : Cannot dot-source this command because it was defined in a different language mode. To invoke
this command without importing its contents, omit the '.' operator.
+ CategoryInfo : InvalidOperation: (:) [latest_symlink_post_install.ps1], NotSupportedException
+ FullyQualifiedErrorId : DotSourceNotSupported,latest_symlink_post_install.ps
'''
I'll give a try on our lab computers in a few day an see if it is fixed on these computers too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for sharing your observations.
Can you please share the details of the machines where hotspot analysis worked and not worked (Processor details, OS details and if it is a linux, kindly specify the kernel version)?
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the noise, I clicked on the wrong button. I need to re-write my post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I posed the details for my computer in the 3rd comment above:
Roughly 2 x 18 Core, (RAM) 192 GB
I still do face a really long time to process vtune trace. The memory performance consumption seems to have improved a lot and this is making the whole process working. Unfortunately, it still required few hours to open a few seconds trace.
I took a look over vtune and my overview understanding is:
1) Frontend: Chromium base UI
2) Backend: Node.js based, running server.js
3) Worker: I'm not sure what is being used. But this is the process that performs symbolisation.
4) Communication: gRPC (or protobuf based)
5) Database: sqlite
The bottleneck seems to be with the workers (see vtune_worker.png). The worker is responsible of the symbolisation (see vtune_debug3). Since the symbolisation is using the dbghelp, I enabled the debugging with the environment variable, hook a debugger to the worker process and I looked to the debghelp output in windbg (see vtune_debug1).
This is making clear that the bottleneck is the symbolisation by far. It is taking about ~2 seconds to ~30 seconds by file and it is going for hours. The output in the debugger is aligned with the output in vtune UI (see screenshot).
As a short term solution, Is it possible to increase the amount of workers? I do have the CPU power / memory to handle more workers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for sharing the information with us.
We are checking on this internally and will get back to you with an update shortly.
To assist you better, kindly perform the following steps and share us the output:
1) Open a File Explorer window to "%TEMP%" (C:\Users\sreede2x\AppData\Local\Temp) and delete the amplxe-log-%USER% and amplxe-tmp-%USER% directories.
2) Run the failing scenario in VTune.
3) Send a .zip of "%TEMP%"\amplxe-log-%USER% to us.
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please confirm if your issue is resolved or not?
Regards,
Sreedevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue is still there and it is still related to symbolisation. It is easy to reproduce as documented above in this post.
I added plenty of details in my comment "11-28-2023" with screenshots.
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page