Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
5255 Discussions

Unable to profile DRAM memory bandwidth

Kailash26
Beginner
1,403 Views

While running an analysis with Intel VTune to measure various performance metrics, including DRAM bandwidth, I noticed that the DRAM bandwidth isn't printed in the results, even though other metrics like CPU cycles and instructions are displayed. I used the -knob collect-memory-bandwidth=true and -knob dram-bandwidth-limits=true options as recommended, but DRAM bandwidth is still missing. Could there be a specific reason or additional configuration required to capture and display the DRAM bandwidth, especially on my current setup with permissions warnings?


Command:

vtune -collect hpc-performance -knob collect-memory-bandwidth=true -knob dram-bandwidth-limits=true perf stat -a -e cycles,instructions,power/energy-pkg/ python test.py


Output:

tune: Warning: Cannot collect GPU hardware metrics due to a lack of permissions. Use root privileges (recommended) or re-configure your current permissions to make sure you are a member of the render group and /proc/sys/dev/i915/perf_stream_paranoid value is set to 0.
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Peak bandwidth measurement started.
vtune: Peak bandwidth measurement finished.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/kailash/Desktop/backupdesktop/CORL/r000hpc -command stop.
tensor([4.0000, 0.0100, 0.0400])

Performance counter stats for 'system wide':

7,468,392,866 cpu_core/cycles/ (22.09%)
4,486,149,664 cpu_atom/cycles/ (31.73%)
14,504,617,193 cpu_core/instructions/ # 1.94 insn per cycle (22.08%)
8,334,989,079 cpu_atom/instructions/ # 1.12 insn per cycle (31.74%)
35.03 Joules power/energy-pkg/

0.377239823 seconds time elapsed

vtune: Collection stopped.
vtune: Using result path `/home/kailash/Desktop/backupdesktop/CORL/r000hpc'
vtune: Executing actions 0 % Finalizing results
vtune: Warning: Cannot collect GPU hardware metrics due to a lack of permissions. Use root privileges (recommended) or re-configure your current permissions to make sure you are a member of the render group and /proc/sys/dev/i915/perf_stream_paranoid value is set to 0.
vtune: Executing actions 19 % Resolving information for `bash'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libc10.so'.
vtune: Warning: Cannot locate debugging information for file `/usr/bin/bash'.
vtune: Executing actions 19 % Resolving information for `libcom_err.so.2.1'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libcom_err.so.2.1'.
vtune: Executing actions 20 % Resolving information for `libcudart-a7b20f20.so.
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libcudart-a7b20f20.so.11.0'.
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so'.
vtune: Executing actions 20 % Resolving information for `libstdc++.so.6.0.30'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30'.
vtune: Executing actions 20 % Resolving information for `libopenblas64_p-r0-742
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so'.
vtune: Executing actions 20 % Resolving information for `libtasn1.so.6.6.2'
vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libtasn1.so.6.6.2'.
vtune: Executing actions 21 % Resolving information for `libtorch_python.so'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libtorch_python.so'.
vtune: Executing actions 22 % Resolving information for `libtorch_cpu.so'
vtune: Warning: Cannot locate debugging information for file `/home/kailash/miniconda3/envs/CORL/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so'.
vtune: Executing actions 75 % Generating a report Elapsed Time: 0.456s
SP GFLOPS: 0.000
DP GFLOPS: 0.000
x87 GFLOPS: 0.000
CPI Rate: 0.583
Average CPU Frequency: 4.897 GHz
Total Thread Count: 43
Effective Physical Core Utilization: 16.0% (3.838 out of 24)
| The metric value is low, which may signal a poor physical CPU cores
| utilization caused by:
| - load imbalance
| - threading runtime overhead
| - contended synchronization
| - thread/process underutilization
| - incorrect affinity that utilizes logical cores instead of physical
| cores
| Explore sub-metrics to estimate the efficiency of MPI and OpenMP parallelism
| or run the Locks and Waits analysis to identify parallel bottlenecks for
| other parallel runtimes.
|
Effective Logical Core Utilization: 18.5% (5.933 out of 32)
| The metric value is low, which may signal a poor logical CPU cores
| utilization. Consider improving physical core utilization as the first
| step and then look at opportunities to utilize logical cores, which in
| some cases can improve processor throughput and overall performance of
| multi-threaded applications.
|
Memory Bound: 4.7% of Pipeline Slots
Performance-core (P-core)
Memory Bound: 4.7% of Pipeline Slots
Cache Bound: 17.0% of Clockticks
DRAM Bound: 1.7% of Clockticks
Efficient-core (E-core)
Memory Bound: 2.3% of Clockticks
Cache Bound: 1.9% of Clockticks
DRAM Bound: 0.1% of Clockticks
Vectorization: 0.0% of Packed FP Operations
Instruction Mix
SP FLOPs: 0.0% of uOps
Packed: 0.0% from SP FP
128-bit: 0.0% from SP FP
256-bit: 0.0% from SP FP
Scalar: 0.0% from SP FP
DP FLOPs: 0.0% of uOps
Packed: 0.0% from DP FP
128-bit: 0.0% from DP FP
256-bit: 0.0% from DP FP
Scalar: 0.0% from DP FP
x87 FLOPs: 0.0% of uOps
Non-FP: 100.0% of uOps
Collection and Platform Info
Application Command Line: perf "stat" "-a" "-e" "cycles,instructions,power/energy-pkg/" "python" "test.py"
Operating System: 6.5.0-44-generic DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
Computer Name: srinija-Precision-3680
Result Size: 28.2 MB
Collection start time: 05:01:11 05/11/2024 UTC
Collection stop time: 05:01:12 05/11/2024 UTC
Collector Type: Driverless Perf system-wide sampling
CPU
Name: Intel(R) microarchitecture code named Raptorlake-DT
Frequency: 3.187 GHz
Logical CPU Count: 32

If you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done

Labels (1)
0 Kudos
7 Replies
yuzhang3_intel
Moderator
1,353 Views

Run the command lines below in one console as administrator.

 

echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/yama/ptrace_scope

echo 0 > /proc/sys/dev/i915/perf_stream_paranoid

0 Kudos
Kailash26
Beginner
1,341 Views

Hi, the problem still persists. I still don't get the bandwidth utilization table.

 

Something like this:
Bandwidth Utilization
Bandwidth Domain Platform Maximum Observed Maximum Average % of Elapsed Time with High BW Utilization(%)
---------------- ---------------- ---------------- ------- ---------------------------------------------
DRAM, GB/sec 35 8.700 1.009 0.0%

0 Kudos
yuzhang3_intel
Moderator
1,314 Views

Which platform? OS?

Could you use the utility vtune-self-check to check your environment first?

0 Kudos
Kailash26
Beginner
1,307 Views

Its 22.04.1-Ubuntu

The model is Intel(R) Core(TM) i9-14900

0 Kudos
yuzhang3_intel
Moderator
1,280 Views

Run /opt/intel/oneapi/vtune/latest/vtune-self-check.sh and post the output. It is better to use VTune 2025.0 release.

0 Kudos
Martin_HZK
Novice
1,065 Views

but Vtune2025 does not include such shell file for identification 

0 Kudos
yuzhang3_intel
Moderator
1,053 Views

/opt/intel/oneapi/vtune/latest/bin64/vtune-self-check.sh 

0 Kudos
Reply