- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to run an HPC Performance Characterization. I keep running into the same error:
Error: This analysis type requires either an access to kernel-mode monitoring in the Linux perf subsystem or installation of the VTune Amplifier drivers (see the "Sampling Drivers" help topic for further details).
The system I work on:
- CPU : Intel(R) Xeon(R) Platinum 8276 CPU @ 2.20GHz
- OS : CentOS 7
The Intel Sampling Drivers are enabled and running insmod-sep -q gives the following output:
pax driver is loaded and owned by group "vtune" with file permissions "660".
socperf3 driver is loaded and owned by group "vtune" with file permissions "660".
sep5 driver is loaded and owned by group "vtune" with file permissions "660".
Warning: skipping SOCWATCH driver, not built
vtsspp driver is loaded and owned by group "vtune" with file permissions "660".
/usr/local/software/intel/oneapi/2024.0/vtune/latest/sepdk/src/insmod-sep: line 327: [: -a: integer expression expected
I tried running the profiler self-check utility and it gives the following output:
Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 626834
HW event-based analysis (counting mode) (Perf)
Example of analysis types: Performance Snapshot
Collection: Ok
Finalization: Ok...
Report: Ok
Instrumentation based analysis check
Example of analysis types: Hotspots and Threading with user-mode sampling
Collection: Ok
vtune: Warning: Only user space will be profiled due to credentials lack. Consider changing /proc/sys/kernel/perf_event_paranoid file for enabling kernel space profiling.
Finalization: Ok...
Report: Ok
HW event-based analysis check
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.
HW event-based analysis check
Example of analysis types: Microarchitecture Exploration
Collection: Fail
vtune: Error: This analysis requires one of these actions: a) Install Intel Sampling Drivers. b) Configure driverless collection with Perf system-wide profiling. To enable Perf system-wide profiling, set /proc/sys/kernel/perf_event_paranoid to 0 or set up Perf tool capabilities.
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
HW event-based analysis with uncore events
Example of analysis types: Memory Access
Collection: Fail
vtune: Error: Cannot collect memory bandwidth data. Make sure the sampling driver is installed and enabled on your system. See the Sampling Drivers help topic for more details. Note that memory bandwidth collection is not possible if you are profiling inside a virtualized environment.
HW event-based analysis with stacks
Example of analysis types: Hotspots with HW event-based sampling and call stacks
Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.
HW event-based analysis with context switches
Example of analysis types: Threading with HW event-based sampling
Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: Context switch data cannot be collected using the Perf-based driverless collection if the kernel version is less than 4.3. Consider loading the VTune Profiler sampling driver using the root credentials.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.
Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.
The check observed a product failure on your system.
Review errors in the output above to fix a problem or contact Intel technical support.
The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
The following analyses have failed on the system:
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)
Based on a previous forum post mentioning a similar issue, I tried running older version of VTune since CentOS 7 is not supported anymore. I ran 2022, 2021 and even Intel VTune Amplifier 2019 with the same result.
I do not have the rights to change the paranoia level on this system.
Does anyone have a clue of what is going here and on how to solve this issue?
Thanks,
Hamza
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The self-check utility running result shows that the VTune driver doesn't work, so VTune uses perf driverless instead.
Could you run $ emon -v to check driver status as below?
yuzhang3@yuzhang3-10710:~$ emon -v
EMON Version .............. V11.45 Beta
Copyright(C) 1993 Intel Corporation. All rights reserved.
Application Build Date: Feb 20 2024 at 06:00:30
SEP Driver Version: 5.45 Beta (public)
PAX Driver Version: 1.0
Linux Kernel Version: 6.5.0-26-generic
Collection Mode: Driver
total_number_of_processors ...... 12
number_of_online_processors ...... 12
cpu_family ................ Intel(R) microarchitecture code named Cometlake U
cpu_model ................. 166 (0xa6)
cpu_stepping .............. 0 (0)
......................
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do some configurations as follows, and run self-check utility again.
echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/yama/ptrace_scope
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your answer.
Unfortunately, I do not have enough permissions on the system to do these changes. I thought I would not need to do that if I had access to the sampling drivers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The self-check utility running result shows that the VTune driver doesn't work, so VTune uses perf driverless instead.
Could you run $ emon -v to check driver status as below?
yuzhang3@yuzhang3-10710:~$ emon -v
EMON Version .............. V11.45 Beta
Copyright(C) 1993 Intel Corporation. All rights reserved.
Application Build Date: Feb 20 2024 at 06:00:30
SEP Driver Version: 5.45 Beta (public)
PAX Driver Version: 1.0
Linux Kernel Version: 6.5.0-26-generic
Collection Mode: Driver
total_number_of_processors ...... 12
number_of_online_processors ...... 12
cpu_family ................ Intel(R) microarchitecture code named Cometlake U
cpu_model ................. 166 (0xa6)
cpu_stepping .............. 0 (0)
......................
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see.
Here is the output of emon -v
EMON Version .............. V11.34
Copyright(C) 1993-2020 Intel Corporation. All rights reserved.
Application Build Date: Mar 28 2022 at 23:38:59
SEP Driver Version: 5.31 (public)
PAX Driver Version: 1.0
Linux Kernel Version: 3.10.0-1160.114.2.el7.x86_64
Collection Mode: Driver
total_number_of_processors ...... 56
number_of_online_processors ...... 56
cpu_family ................ Intel(R) Xeon(R) Processor code named Cascadelake
cpu_model ................. 85 (0x55)
cpu_stepping .............. 7 (0x7)
L1 Data Cache ............. 32KB, 8-way, 64-byte line size
2 HW threads share this cache, No SW Init Required
L1 Code Cache ............. 32KB, 8-way, 64-byte line size
2 HW threads share this cache, No SW Init Required
L2 Unified Cache .......... 1MB, 16-way, 64-byte line size
2 HW threads share this cache, No SW Init Required
L3 Unified Cache .......... 38MB, Fully Associative, 64-byte line size
No SW Init Required
Data TLB .................. 4-way, 4K Pages, 64 entries
64-byte Prefetching
Device Type ............... Intel(R) Xeon(R) Processor code named Cascadelake
EMON Database ............. cascadelake_server
Platform type ............. 127
number_of_selectors ....... 8
number_of_var_counters .... 8
number_of_fixed_ctrs....... 3
Fixed Counter Events:
counter 0 ................. INST_RETIRED.ANY
counter 1 ................. CPU_CLK_UNHALTED.THREAD
counter 2 ................. CPU_CLK_UNHALTED.REF_TSC
number of devices ......... 1
number_of_events .......... 2325
Processor Features:
(Thermal Throttling) (Enabled)
(Hyper-Threading) (Disabled)
(MLC Streamer Prefetching) (Enabled)
(MLC Spatial Prefetching) (Enabled)
(DCU Streamer Prefetching) (Enabled)
(DCU IP Prefetching) (Enabled)
(Number of Packages: 2)
(Cores Per Package: 28)
(Threads Per Package: 28)
(Threads Per Core: 1)
Uncore Performance Monitoring Units:
cha : 28
imc : 6
pcu : 1
qpi : 3
r3qpi : 3
ubox : 1
m2pcie : 4
m2m : 2
irp : 5
iio : 5
rdt : 1
hfi_rxe : 0
hfi_txe : 0
RDT H/W Support:
L3 Cache Occupancy : Yes
Total Memory Bandwidth : Yes
Local Memory Bandwidth : Yes
L3 Cache Allocation : Yes
L2 Cache Allocation : No
Highest Available RMID : 223
Sample Multiplier : 114688
GPU Information:
No GPU devices found
RAM Features:
(Package/Memory Controller/Channel)
(0/0/0) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(0/0/1) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(0/0/2) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(0/1/0) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(0/1/1) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(0/1/2) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/0/0) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/0/1) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/0/2) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/1/0) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/1/1) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
(1/1/2) (Total Number of Ranks on this Channel: 2)
(Dimm0 Info: Empty)
(Dimm1 Info: Empty)
QPI Link Features:
Package 0 :
QPI Link 0 connects to Package 1, Link 1
QPI Link 1 connects to Package 1, Link 0
Package 1 :
QPI Link 0 connects to Package 0, Link 1
QPI Link 1 connects to Package 0, Link 0
IIO Unit Features:
Package 0 :
domain:0 bus:0x00 stack:0 mesh:62
domain:0 bus:0x17 stack:1 mesh:62
domain:0 bus:0x3a stack:2 mesh:63
domain:0 bus:0x5d stack:3 mesh:59
domain:0 bus:0x00 stack:4 mesh:60
Package 1 :
domain:0 bus:0x80 stack:0 mesh:62
domain:0 bus:0x85 stack:1 mesh:62
domain:0 bus:0xae stack:2 mesh:63
domain:0 bus:0xd7 stack:3 mesh:59
domain:0 bus:0x00 stack:4 mesh:60
TSC Freq .................. 2200.00 MHz
UFS Freq (limit) .......... 2400.00 MHz
Processor Base Freq ....... 2200.00 MHz
Processor Maximum Freq .... 4000.00 MHz
Bus Reference Freq ........ 100.00 MHz
MAX TURBO RATIO (limit)
1C .......... 4000.00 MHz
2C .......... 4000.00 MHz
3C .......... 3800.00 MHz
4C .......... 3800.00 MHz
5C .......... 3700.00 MHz
6C .......... 3700.00 MHz
7C .......... 3700.00 MHz
8C .......... 3700.00 MHz
9C .......... 3700.00 MHz
10C ......... 3700.00 MHz
11C ......... 3700.00 MHz
12C ......... 3700.00 MHz
13C ......... 3700.00 MHz
14C ......... 3700.00 MHz
15C ......... 3700.00 MHz
16C ......... 3700.00 MHz
17C ......... 3400.00 MHz
18C ......... 3400.00 MHz
19C ......... 3400.00 MHz
20C ......... 3400.00 MHz
21C ......... 3100.00 MHz
22C ......... 3100.00 MHz
23C ......... 3100.00 MHz
24C ......... 3100.00 MHz
25C ......... 3000.00 MHz
26C ......... 3000.00 MHz
27C ......... 3000.00 MHz
28C ......... 3000.00 MHz
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55
OS Processor <-> Physical/Logical Mapping
-----------------------------------------
OS Processor Phys. Package Core Logical Processor Core Type
0 0 0 0 core
1 1 0 0 core
2 0 6 0 core
3 1 6 0 core
4 0 1 0 core
5 1 1 0 core
6 0 5 0 core
7 1 5 0 core
8 0 2 0 core
9 1 2 0 core
10 0 4 0 core
11 1 4 0 core
12 0 3 0 core
13 1 3 0 core
14 0 14 0 core
15 1 14 0 core
16 0 8 0 core
17 1 8 0 core
18 0 13 0 core
19 1 13 0 core
20 0 9 0 core
21 1 9 0 core
22 0 12 0 core
23 1 12 0 core
24 0 10 0 core
25 1 10 0 core
26 0 11 0 core
27 1 11 0 core
28 0 16 0 core
29 1 16 0 core
30 0 22 0 core
31 1 22 0 core
32 0 17 0 core
33 1 17 0 core
34 0 21 0 core
35 1 21 0 core
36 0 18 0 core
37 1 18 0 core
38 0 20 0 core
39 1 20 0 core
40 0 19 0 core
41 1 19 0 core
42 0 30 0 core
43 1 30 0 core
44 0 24 0 core
45 1 24 0 core
46 0 29 0 core
47 1 29 0 core
48 0 25 0 core
49 1 25 0 core
50 0 28 0 core
51 1 28 0 core
52 0 26 0 core
53 1 26 0 core
54 0 27 0 core
55 1 27 0 core
-----------------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the drivers are ready. You can profile one simple sample as below and get the output.
$ vtune -collect hotspots -knob sampling-mode=hw -- /usr/bin/ls
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found out what was happening. The drivers were not enabled on every partition of the cluster I am working on. I was able to identify which ones had the drivers enabled thanks to your tests.
Thanks a lot!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page