Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5010 Discussions

Unable to use HW mode sampling with VTune

HamzaC
Novice
693 Views

Hello,

I am trying to run an HPC Performance Characterization. I keep running into the same error:

Error: This analysis type requires either an access to kernel-mode monitoring in the Linux perf subsystem or installation of the VTune Amplifier drivers (see the "Sampling Drivers" help topic for further details).

The system I work on:

  • CPU : Intel(R) Xeon(R) Platinum 8276 CPU @ 2.20GHz
  • OS : CentOS 7

The Intel Sampling Drivers are enabled and running insmod-sep -q gives the following output:

pax driver is loaded and owned by group "vtune" with file permissions "660".
socperf3 driver is loaded and owned by group "vtune" with file permissions "660".
sep5 driver is loaded and owned by group "vtune" with file permissions "660".
Warning: skipping SOCWATCH driver, not built
vtsspp driver is loaded and owned by group "vtune" with file permissions "660".
/usr/local/software/intel/oneapi/2024.0/vtune/latest/sepdk/src/insmod-sep: line 327: [: -a: integer expression expected

 I tried running the profiler self-check utility and it gives the following output:

Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 626834

HW event-based analysis (counting mode) (Perf)   
Example of analysis types: Performance Snapshot
    Collection: Ok
    Finalization: Ok...
    Report: Ok

Instrumentation based analysis check   
Example of analysis types: Hotspots and Threading with user-mode sampling
    Collection: Ok
vtune: Warning: Only user space will be profiled due to credentials lack. Consider changing /proc/sys/kernel/perf_event_paranoid file for enabling kernel space profiling.
    Finalization: Ok...
    Report: Ok

HW event-based analysis check   
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
    Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.

HW event-based analysis check   
Example of analysis types: Microarchitecture Exploration
    Collection: Fail
vtune: Error: This analysis requires one of these actions: a) Install Intel Sampling Drivers. b) Configure driverless collection with Perf system-wide profiling. To enable Perf system-wide profiling, set /proc/sys/kernel/perf_event_paranoid to 0 or set up Perf tool capabilities.
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.

HW event-based analysis with uncore events   
Example of analysis types: Memory Access
    Collection: Fail
vtune: Error: Cannot collect memory bandwidth data. Make sure the sampling driver is installed and enabled on your system. See the Sampling Drivers help topic for more details. Note that memory bandwidth collection is not possible if you are profiling inside a virtualized environment.

HW event-based analysis with stacks   
Example of analysis types: Hotspots with HW event-based sampling and call stacks
    Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.

HW event-based analysis with context switches   
Example of analysis types: Threading with HW event-based sampling
    Collection: Fail
vtune: Warning: Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel module symbols.
vtune: Warning: To analyze modules at the kernel level in this configuration, load the Intel sampling driver and set an unlimited (0) value for the Stack size option (if you require stack collection). Alternatively, enable access to kernel-mode monitoring by setting the /proc/sys/kernel/perf_event_paranoid value to 1 or lower.
vtune: Warning: Context switch data cannot be collected using the Perf-based driverless collection if the kernel version is less than 4.3. Consider loading the VTune Profiler sampling driver using the root credentials.
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
vtune: Warning: Consider reducing possible collection overhead by setting the /proc/sys/kernel/perf_event_paranoid value to 0 (or less).
vtune: Error: This driverless collection is restricted in the OS. Consider setting the /proc/sys/kernel/perf_event_paranoid value to 0 or less.

Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

The check observed a product failure on your system.
Review errors in the output above to fix a problem or contact Intel technical support.

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling

The following analyses have failed on the system:
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

Based on a previous forum post mentioning a similar issue, I tried running older version of VTune since CentOS 7 is not supported anymore. I ran 2022, 2021 and even Intel VTune Amplifier 2019 with the same result.
I do not have the rights to change the paranoia level on this system.

Does anyone have a clue of what is going here and on how to solve this issue?

Thanks,
Hamza

Labels (1)
1 Solution
yuzhang3_intel
Moderator
632 Views

The self-check utility running result shows that the VTune driver doesn't work, so VTune uses perf driverless instead.

 

Could you run $ emon -v to check driver status as below?

yuzhang3@yuzhang3-10710:~$ emon -v
EMON Version .............. V11.45 Beta
Copyright(C) 1993 Intel Corporation. All rights reserved.
Application Build Date: Feb 20 2024 at 06:00:30
SEP Driver Version: 5.45 Beta (public)
PAX Driver Version: 1.0
Linux Kernel Version: 6.5.0-26-generic
Collection Mode: Driver
total_number_of_processors ...... 12
number_of_online_processors ...... 12
cpu_family ................ Intel(R) microarchitecture code named Cometlake U
cpu_model ................. 166 (0xa6)
cpu_stepping .............. 0 (0)

......................

View solution in original post

0 Kudos
6 Replies
yuzhang3_intel
Moderator
659 Views

Do some configurations as follows, and run self-check utility again.

echo 0 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/yama/ptrace_scope

0 Kudos
HamzaC
Novice
648 Views

Thanks for your answer.

Unfortunately, I do not have enough permissions on the system to do these changes. I thought I would not need to do that if I had access to the sampling drivers.

0 Kudos
yuzhang3_intel
Moderator
633 Views

The self-check utility running result shows that the VTune driver doesn't work, so VTune uses perf driverless instead.

 

Could you run $ emon -v to check driver status as below?

yuzhang3@yuzhang3-10710:~$ emon -v
EMON Version .............. V11.45 Beta
Copyright(C) 1993 Intel Corporation. All rights reserved.
Application Build Date: Feb 20 2024 at 06:00:30
SEP Driver Version: 5.45 Beta (public)
PAX Driver Version: 1.0
Linux Kernel Version: 6.5.0-26-generic
Collection Mode: Driver
total_number_of_processors ...... 12
number_of_online_processors ...... 12
cpu_family ................ Intel(R) microarchitecture code named Cometlake U
cpu_model ................. 166 (0xa6)
cpu_stepping .............. 0 (0)

......................

0 Kudos
HamzaC
Novice
596 Views

I see.

Here is the output of emon -v

EMON Version .............. V11.34  
Copyright(C) 1993-2020 Intel Corporation. All rights reserved.
Application Build Date: Mar 28 2022 at 23:38:59
SEP Driver Version: 5.31  (public)
PAX Driver Version: 1.0
Linux Kernel Version: 3.10.0-1160.114.2.el7.x86_64
Collection Mode: Driver
total_number_of_processors  ...... 56
number_of_online_processors ...... 56
cpu_family ................ Intel(R) Xeon(R) Processor code named Cascadelake
cpu_model ................. 85 (0x55)
cpu_stepping .............. 7 (0x7)
L1 Data Cache ............. 32KB, 8-way, 64-byte line size
                            2 HW threads share this cache, No SW Init Required
L1 Code Cache ............. 32KB, 8-way, 64-byte line size
                            2 HW threads share this cache, No SW Init Required
L2 Unified Cache .......... 1MB, 16-way, 64-byte line size
                            2 HW threads share this cache, No SW Init Required
L3 Unified Cache .......... 38MB, Fully Associative, 64-byte line size
                            No SW Init Required
Data TLB .................. 4-way, 4K Pages, 64 entries
64-byte Prefetching

Device Type ............... Intel(R) Xeon(R) Processor code named Cascadelake
EMON Database ............. cascadelake_server
Platform type ............. 127
number_of_selectors ....... 8
number_of_var_counters .... 8
number_of_fixed_ctrs....... 3
Fixed Counter Events:
counter 0 ................. INST_RETIRED.ANY
counter 1 ................. CPU_CLK_UNHALTED.THREAD
counter 2 ................. CPU_CLK_UNHALTED.REF_TSC
number of devices ......... 1
number_of_events .......... 2325

Processor Features:
    (Thermal Throttling) (Enabled)
    (Hyper-Threading) (Disabled)
    (MLC Streamer Prefetching) (Enabled)
    (MLC Spatial Prefetching) (Enabled)
    (DCU Streamer Prefetching) (Enabled)
    (DCU IP Prefetching) (Enabled)
    (Number of Packages:    2)
    (Cores Per Package:    28)
    (Threads Per Package:  28)
    (Threads Per Core:      1)

Uncore Performance Monitoring Units:
    cha             : 28
    imc             : 6
    pcu             : 1
    qpi             : 3
    r3qpi           : 3
    ubox            : 1
    m2pcie          : 4
    m2m             : 2
    irp             : 5
    iio             : 5
    rdt             : 1
    hfi_rxe         : 0
    hfi_txe         : 0

RDT H/W Support:
    L3 Cache Occupancy          : Yes
    Total Memory Bandwidth      : Yes
    Local Memory Bandwidth      : Yes
    L3 Cache Allocation         : Yes
    L2 Cache Allocation         : No
    Highest Available RMID      : 223
    Sample Multiplier           : 114688

GPU Information:
    No GPU devices found

RAM Features:
    (Package/Memory Controller/Channel)
        (0/0/0) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (0/0/1) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (0/0/2) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (0/1/0) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (0/1/1) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (0/1/2) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/0/0) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/0/1) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/0/2) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/1/0) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/1/1) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)
        (1/1/2) (Total Number of Ranks on this Channel: 2)
                 (Dimm0 Info: Empty)
                 (Dimm1 Info: Empty)

QPI Link Features:
        Package 0 :
                QPI Link 0 connects to Package 1, Link 1
                QPI Link 1 connects to Package 1, Link 0
        Package 1 :
                QPI Link 0 connects to Package 0, Link 1
                QPI Link 1 connects to Package 0, Link 0

IIO Unit Features:
        Package 0 :
                domain:0 bus:0x00 stack:0 mesh:62
                domain:0 bus:0x17 stack:1 mesh:62
                domain:0 bus:0x3a stack:2 mesh:63
                domain:0 bus:0x5d stack:3 mesh:59
                domain:0 bus:0x00 stack:4 mesh:60
        Package 1 :
                domain:0 bus:0x80 stack:0 mesh:62
                domain:0 bus:0x85 stack:1 mesh:62
                domain:0 bus:0xae stack:2 mesh:63
                domain:0 bus:0xd7 stack:3 mesh:59
                domain:0 bus:0x00 stack:4 mesh:60

TSC Freq .................. 2200.00 MHz

UFS Freq (limit) .......... 2400.00 MHz

Processor Base Freq ....... 2200.00 MHz

Processor Maximum Freq .... 4000.00 MHz

Bus Reference Freq ........ 100.00 MHz

MAX TURBO RATIO (limit)
1C .......... 4000.00 MHz
2C .......... 4000.00 MHz
3C .......... 3800.00 MHz
4C .......... 3800.00 MHz
5C .......... 3700.00 MHz
6C .......... 3700.00 MHz
7C .......... 3700.00 MHz
8C .......... 3700.00 MHz
9C .......... 3700.00 MHz
10C ......... 3700.00 MHz
11C ......... 3700.00 MHz
12C ......... 3700.00 MHz
13C ......... 3700.00 MHz
14C ......... 3700.00 MHz
15C ......... 3700.00 MHz
16C ......... 3700.00 MHz
17C ......... 3400.00 MHz
18C ......... 3400.00 MHz
19C ......... 3400.00 MHz
20C ......... 3400.00 MHz
21C ......... 3100.00 MHz
22C ......... 3100.00 MHz
23C ......... 3100.00 MHz
24C ......... 3100.00 MHz
25C ......... 3000.00 MHz
26C ......... 3000.00 MHz
27C ......... 3000.00 MHz
28C ......... 3000.00 MHz

NUMA node(s):          2
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55

                   OS Processor <-> Physical/Logical Mapping
                   -----------------------------------------
          OS Processor    Phys. Package       Core      Logical Processor       Core Type
                0               0               0               0               core
                1               1               0               0               core
                2               0               6               0               core
                3               1               6               0               core
                4               0               1               0               core
                5               1               1               0               core
                6               0               5               0               core
                7               1               5               0               core
                8               0               2               0               core
                9               1               2               0               core
                10              0               4               0               core
                11              1               4               0               core
                12              0               3               0               core
                13              1               3               0               core
                14              0               14              0               core
                15              1               14              0               core
                16              0               8               0               core
                17              1               8               0               core
                18              0               13              0               core
                19              1               13              0               core
                20              0               9               0               core
                21              1               9               0               core
                22              0               12              0               core
                23              1               12              0               core
                24              0               10              0               core
                25              1               10              0               core
                26              0               11              0               core
                27              1               11              0               core
                28              0               16              0               core
                29              1               16              0               core
                30              0               22              0               core
                31              1               22              0               core
                32              0               17              0               core
                33              1               17              0               core
                34              0               21              0               core
                35              1               21              0               core
                36              0               18              0               core
                37              1               18              0               core
                38              0               20              0               core
                39              1               20              0               core
                40              0               19              0               core
                41              1               19              0               core
                42              0               30              0               core
                43              1               30              0               core
                44              0               24              0               core
                45              1               24              0               core
                46              0               29              0               core
                47              1               29              0               core
                48              0               25              0               core
                49              1               25              0               core
                50              0               28              0               core
                51              1               28              0               core
                52              0               26              0               core
                53              1               26              0               core
                54              0               27              0               core
                55              1               27              0               core
                   -----------------------------------------
0 Kudos
yuzhang3_intel
Moderator
579 Views

It looks like the drivers are ready. You can profile one simple sample as below and get the output.

 

$ vtune -collect hotspots -knob sampling-mode=hw -- /usr/bin/ls

 

0 Kudos
HamzaC
Novice
430 Views

I found out what was happening. The drivers were not enabled on every partition of the cluster I am working on. I was able to identify which ones had the drivers enabled thanks to your tests.

Thanks a lot!

0 Kudos
Reply