Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5149 Discussions

Unable to profile DRAM memory bandwidth

Kailash26
Beginner
730 Views

While running an analysis with Intel VTune to measure various performance metrics, including DRAM bandwidth, I noticed that the DRAM bandwidth isn't printed in the results, even though other metrics like CPU cycles and instructions are displayed. I used the -knob collect-memory-bandwidth=true and -knob dram-bandwidth-limits=true options as recommended, but DRAM bandwidth is still missing. Could there be a specific reason or additional configuration required to capture and display the DRAM bandwidth, especially on my current setup with permissions warnings?


Command:

vtune -collect hpc-performance -knob collect-memory-bandwidth=true -knob dram-bandwidth-limits=true perf stat -a -e cycles,instructions,power/energy-pkg/ python test.py


Output:

tune: Warning: Cannot collect GPU hardware metrics due to a lack of permissions. Use root privileges (recommended) or re-configure your current permissions to make sure you are a member of the render group and /proc/sys/dev/i915/perf_stream_paranoid value is set to 0.
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Peak bandwidth measurement started.
vtune: Peak bandwidth measurement finished.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/kailash/Desktop/backupdesktop/CORL/r000hpc -command stop.
tensor([4.0000, 0.0100, 0.0400])

Performance counter stats for 'system wide':

7,468,392,866 cpu_core/cycles/ (22.09%)
4,486,149,664 cpu_atom/cycles/ (31.73%)
14,504,617,193 cpu_core/instructions/ # 1.94 insn per cycle (22.08%)
8,334,989,079 cpu_atom/instructions/ # 1.12 insn per cycle (31.74%)
35.03 Joules power/energy-pkg/

0.377239823 seconds time elapsed

vtune: Collection stopped.
vtune: Using result path `/home/kailash/Desktop/backupdesktop/CORL/r000hpc'
vtune: Executing actions 0 % Finalizing results
vtune: Warning: Cannot collect GPU hardware metrics due to a lack of permissions. Use root privileges (recommended) or re-configure your current permissions to make sure you are a member of the render group and /proc/sys/dev/i915/perf_stream_paranoid value is set to 0.
vtune: Executing actions 19 % Resolving information for `bash'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libc10.so'.
vtune: Warning: Cannot locate debugging information for file `/usr/bin/bash'.
vtune: Executing actions 19 % Resolving information for `libcom_err.so.2.1'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libcom_err.so.2.1'.
vtune: Executing actions 20 % Resolving information for `libcudart-a7b20f20.so.
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libcudart-a7b20f20.so.11.0'.
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so'.
vtune: Executing actions 20 % Resolving information for `libstdc++.so.6.0.30'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30'.
vtune: Executing actions 20 % Resolving information for `libopenblas64_p-r0-742
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so'.
vtune: Executing actions 20 % Resolving information for `libtasn1.so.6.6.2'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libtasn1.so.6.6.2'.
vtune: Executing actions 21 % Resolving information for `libtorch_python.so'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libtorch_python.so'.
vtune: Executing actions 22 % Resolving information for `libtorch_cpu.so'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so'.
vtune: Executing actions 75 % Generating a report Elapsed Time: 0.456s
SP GFLOPS: 0.000
DP GFLOPS: 0.000
x87 GFLOPS: 0.000
CPI Rate: 0.583
Average CPU Frequency: 4.897 GHz
Total Thread Count: 43
Effective Physical Core Utilization: 16.0% (3.838 out of 24)
| The metric value is low, which may signal a poor physical CPU cores
| utilization caused by:
| - load imbalance
| - threading runtime overhead
| - contended synchronization
| - thread/process underutilization
| - incorrect affinity that utilizes logical cores instead of physical
| cores
| Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism
| or run the Locks and Waits analysis to identify parallel bottlenecks for
| other parallel runtimes.
|
Effective Logical Core Utilization: 18.5% (5.933 out of 32)
| The metric value is low, which may signal a poor logical CPU cores
| utilization. Consider improving physical core utilization as the first
| step and then look at opportunities to utilize logical cores, which in
| some cases can improve processor throughput and overall performance of
| multi-threaded applications.
|
Memory Bound: 4.7% of Pipeline Slots
Performance-core (P-core)
Memory Bound: 4.7% of Pipeline Slots
Cache Bound: 17.0% of Clockticks
DRAM Bound: 1.7% of Clockticks
Efficient-core (E-core)
Memory Bound: 2.3% of Clockticks
Cache Bound: 1.9% of Clockticks
DRAM Bound: 0.1% of Clockticks
Vectorization: 0.0% of Packed FP Operations
Instruction Mix
SP FLOPs: 0.0% of uOps
Packed: 0.0% from SP FP
128-bit: 0.0% from SP FP
256-bit: 0.0% from SP FP
Scalar: 0.0% from SP FP
DP FLOPs: 0.0% of uOps
Packed: 0.0% from DP FP
128-bit: 0.0% from DP FP
256-bit: 0.0% from DP FP
Scalar: 0.0% from DP FP
x87 FLOPs: 0.0% of uOps
Non-FP: 100.0% of uOps
Collection and Platform Info
Application Command Line: perf "stat" "-a" "-e" "cycles,instructions,power/energy-pkg/" "python" "test.py"
Operating System: 6.5.0-44-generic DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
Computer Name: srinija-Precision-3680
Result Size: 28.2 MB
Collection start time: 05:01:11 05/11/2024 UTC
Collection stop time: 05:01:12 05/11/2024 UTC
Collector Type: Driverless Perf system-wide sampling
CPU
Name: Intel(R) microarchitecture code named Raptorlake-DT
Frequency: 3.187 GHz
Logical CPU Count: 32

If you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done

Labels (1)
0 Kudos
7 Replies
yuzhang3_intel
Moderator
680 Views

Run the command lines below in one console as administrator.

 

echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/yama/ptrace_scope

echo 0 > /proc/sys/dev/i915/perf_stream_paranoid

0 Kudos
Kailash26
Beginner
668 Views

Hi, the problem still persists. I still don't get the bandwidth utilization table.

 

Something like this:
Bandwidth Utilization
Bandwidth Domain Platform Maximum Observed Maximum Average % of Elapsed Time with High BW Utilization(%)
---------------- ---------------- ---------------- ------- ---------------------------------------------
DRAM, GB/sec 35 8.700 1.009 0.0%

0 Kudos
yuzhang3_intel
Moderator
641 Views

Which platform? OS?

Could you use the utility vtune-self-check to check your environment first?

0 Kudos
Kailash26
Beginner
634 Views

Its 22.04.1-Ubuntu

The model is Intel(R) Core(TM) i9-14900

0 Kudos
yuzhang3_intel
Moderator
607 Views

Run /opt/intel/oneapi/vtune/latest/vtune-self-check.sh and post the output. It is better to use VTune 2025.0 release.

0 Kudos
Martin_HZK
Novice
392 Views

but Vtune2025 does not include such shell file for identification 

0 Kudos
yuzhang3_intel
Moderator
380 Views

/opt/intel/oneapi/vtune/latest/bin64/vtune-self-check.sh 

0 Kudos
Reply