Solved: VTune running out of memory when finalizing results

metricv · ‎07-13-2023

VTune is running out of (256G of) memory when finalizing two 10-minute traces with around 100MB of raw data.

Three machines involved:

"target": Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 16GB of RAM. Ubuntu Server 22.04 LTS. Intel VTune 2023.1.0.625246

"host": AMD Ryzen 7950X, 64GB of RAM. Ubuntu Desktop 23.04. Intel VTune 2023.1.0.625246

"helper": A node in an HPC cluster, Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, 256GB memory requested. Springdale Linux. Intel VTune 2022.4.0

The first trace is collected directly on the "target" with CLI:

vtune -collect uarch-exploration -knob collect-memory-bandwidth=true -analyze-system -duration=600 -data-limit=2500 -r <dir>

VTune is installed with the OneAPI installer. It ran out of memory when finalizing the trace. The target had 16GB of memory.

The second trace is collected with GUI on the "host," configured to "Remote Linux (SSH)" into "target," with VTune profiler deployed by the GUI into /tmp. Microarchitecture exploration, 2500MB data limit, and manually stopped around 10 minutes. It ran out of (64GB) memory on the "host" machine after collection is stopped and finalizing the results, and was killed by the kernel.

The second trace was copied onto "helper" and finalized with "vtune -finalize --finalization-mode=full -r <result>". It ran out of 256GB of memory. Changing to finalization mode fast did not help.

In all the cases, VTune memory consumption grow steadily until maximum, and stuck on "vtune: Executing actions 25 % Loading 'system-wide.perf' file"

I can share the trace file if someone is interested in recreating the error.

metricv · ‎07-13-2023

Update: Finalization was completed with 280GB of RAM consumption. Why does it require so much RAM?

View solution in original post

metricv · ‎07-13-2023

Some broad metrics collected from the "helper" machine.

There are write operations, but the result directory did not grow in size.

metricv · ‎07-13-2023

Update: Finalization was completed with 280GB of RAM consumption. Why does it require so much RAM?

RajashekarK_Intel · ‎07-17-2023

Hi, Thanks for posting in Intel communities.

We would like to know more information on this,

1. Is it the case with other analysis too?

2. Do you observe the same with matrix sample under VTune samples with Microarchitecture analysis.

/opt/intel/oneapi/vtune/latest/samples/en/C++/matrix

3. Requesting to share your sample reproducer to replicate this from our end.

Regards,

Rajashekar

metricv · ‎07-17-2023

Hi Rajashekar,

Thank you for the response!

1. I did not try other kinds of analysis other than microarchitectural exploration.

2. I did not try profiling other workloads. The issue seems to be going away after a reboot of the target machine, so I cannot replicate it now. Most likely, some workload on the target system was causing this, and it's gone after a reboot.

3. Please use this link and request access: https://drive.google.com/file/d/1uVXjlnvqW-3ljpIe8CZCy1B3BLrH92eQ/view?usp=sharing

RajashekarK_Intel · ‎07-17-2023

Hi @metricv ,

Glad to know that your issue is resolved after the reboot. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Regards,

Rajashekar