Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5140 Discussions

Problem Loading SEP Module (Sampling Drivers) CentOS Remote Capture

randall__brent
Beginner
3,367 Views

 

Version: Linux version 3.10.0-693.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Wed Mar 7 19:03:37 UTC 2018

I am able to successful compile the drivers using ./build-drivers and pointing at /usr/src/kernels/3.10.0-693.21.1.el7.x86_64

However when I got to install the modules I get the following output:

./insmod-sep -r

Warning:  the following driver(s) were not found loaded in the kernel:  sep4_1.

Warning:  no vtsspp driver was found loaded in the kernel.
Removing socperf2_0 driver from the kernel ... done.
Deleting /dev/socperf2_0 devices ... done.
The socperf2_0 driver has been successfully unloaded.
Attempting to stop PAX service ...
Removing pax driver from the kernel ... done.
Deleting previously created /dev/pax device ... done.
The pax driver has been successfully unloaded.
PAX service has been stopped.
Checking for PMU arbitration service (PAX) ... not detected.
Attempting to start PAX service ...
Executing: insmod ./pax/pax-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
Creating /dev/pax device with major number 246 ... done.
Setting group ownership of devices to group "vtune" ... done.
Setting file permissions on devices to "666" ... done.
The pax driver has been successfully loaded.
PAX service has been started.
Checking for socperf driver ... not detected.
Executing: insmod ./socperf/src/socperf2_0-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
Creating /dev/socperf2_0 base devices with major number 245 ... done.
Setting group ownership of devices to group "vtune" ... done.
Setting file permissions on devices to "666" ... done.
The socperf2_0 driver has been successfully loaded.
Executing: insmod ./sep4_1-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
insmod: ERROR: could not insert module ./sep4_1-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko: Invalid parameters

Error:  sep4_1 driver failed to load!

You may need to build sep4_1 driver for your kernel.
Please see the sep4_1 driver README for instructions.

 

Running dmesg I can see that some symbols are coming up as mismatched versions:

dmesg | tail
[1767003.842646] sep4_1: disagrees about version of symbol trace_event_raw_init
[1767003.842648] sep4_1: Unknown symbol trace_event_raw_init (err -22)
[1767003.842686] sep4_1: disagrees about version of symbol ftrace_event_reg
[1767003.842687] sep4_1: Unknown symbol ftrace_event_reg (err -22)
[1767003.842706] sep4_1: disagrees about version of symbol trace_define_field
[1767003.842708] sep4_1: Unknown symbol trace_define_field (err -22)
[1767003.842709] sep4_1: disagrees about version of symbol trace_event_buffer_lock_reserve
[1767003.842711] sep4_1: Unknown symbol trace_event_buffer_lock_reserve (err -22)
[1767003.842724] sep4_1: disagrees about version of symbol filter_current_check_discard
[1767003.842725] sep4_1: Unknown symbol filter_current_check_discard (err -22)

 

Are there additional steps I need to take to get remote capture working? I went ahead tried to execute a remote capture anyway and got this error in the GUI (from a windows machine if that matters). I am attempting to do a "Profile System" capture.

Collection failed

Jun 06 2018 21:12:31 Collection failed. The data cannot be displayed.

To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
The following events cannot be collected: CPU_CLK_UNHALTED.THREAD_P_ANY. Consider removing the events from the collection, loading the VTune Amplifier sampling driver using the root credentials, or updating the OS kernel.

 

Thanks!

0 Kudos
12 Replies
PAVEL_G_Intel
Employee
3,367 Views

Hi, 

Looks like some configuration mismatch occurred. Do you patch your kernel? 
Please, validate that needed kernel options are set. List of options may be found here.
Could you also send "uname -a" output.

- Pavel

 

0 Kudos
randall__brent
Beginner
3,367 Views

uname -a

Linux host 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I should have mentioned in the original post but I did validate that the config matches the values listed in the documentation. As far as I know this kernel isn't patched but I didn't provision this host myself. I can try to find out. Do you know if there is an easy way to find that piece of information from the terminal?

 

EDIT: I reached out to the person who ordered the host and we don't believe it should be a customized kernel. Should just be vanilla.

0 Kudos
PAVEL_G_Intel
Employee
3,367 Views

Lets validate it by ourself.

The kernel configuration on CentOS may be found in /boot/config-{kernel-version} file. So check that the options set there. Are they correct?

- Pavel

 

0 Kudos
randall__brent
Beginner
3,367 Views

I just re-validated that all the config options found here are set in /boot/config-3.10.0-693.21.1.el7.x86_64

0 Kudos
PAVEL_G_Intel
Employee
3,367 Views

It's definitely problem of kernel mismatch. 

The kernel you are running is different with kernel the drivers built. 
I don't know good way for check it... except try to build the kernel in /usr/src/kernel/ and compare the checksum of running kernel and built one.

Unfortunately /boot/config-3.10.0-693.21.1.el7.x86_64 file may contain kernel options for old kernel (if patched)... And CentOS 7.x has no /proc/config.gz for correct option list.

- Pavel

0 Kudos
PAVEL_G_Intel
Employee
3,367 Views

Looks like there is a simpler method.

Just get list of files in /boot with date of modification. If vmlinux modification date is similar to config file it sign that the config is correct.

- Pavel 

0 Kudos
randall__brent
Beginner
3,367 Views

[root@host ~]# ls -l /boot/ | grep config
-rw-r--r--. 1 root root   123891 Aug  5  2015 config-3.10.0-229.11.1.el7.x86_64
-rw-r--r--  1 root root   123891 Sep 15  2015 config-3.10.0-229.14.1.el7.x86_64
-rw-r--r--. 1 root root   123838 Mar  6  2015 config-3.10.0-229.el7.x86_64
-rw-r--r--  1 root root   140971 Mar  7 11:16 config-3.10.0-693.21.1.el7.x86_64
[root@host ~]# ls -l /usr/src/kernels/
total 4
drwxr-xr-x 22 root root 4096 Jun  6 16:25 3.10.0-862.3.2.el7.x86_64

Looking at this I think I realized my mistake. Although, I am running 3.10.0-693.21.1.el7.x86_64 it seems like I installed source for 3.10.0-862.3.2.el7.x86_64.

The build is probably referencing the newer /src so I should just need to find the right rpm, download it and install it. I'm an idiot. I'll let you know what happens once I find the right src.

0 Kudos
randall__brent
Beginner
3,367 Views

Success! It looks like the modules are loading now.

The vtune remote collection seems to be working now. It's still finalizing the report so we'll see.

 

[root@host src]# ./insmod-sep -r

Warning:  the following driver(s) were not found loaded in the kernel:  sep4_1.

Warning:  no vtsspp driver was found loaded in the kernel.
Removing socperf2_0 driver from the kernel ... done.
Deleting /dev/socperf2_0 devices ... done.
The socperf2_0 driver has been successfully unloaded.
Attempting to stop PAX service ...
Removing pax driver from the kernel ... done.
Deleting previously created /dev/pax device ... done.
The pax driver has been successfully unloaded.
PAX service has been stopped.
Checking for PMU arbitration service (PAX) ... not detected.
Attempting to start PAX service ...
Executing: insmod ./pax/pax-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
Creating /dev/pax device with major number 246 ... done.
Setting group ownership of devices to group "vtune" ... done.
Setting file permissions on devices to "666" ... done.
The pax driver has been successfully loaded.
PAX service has been started.
Checking for socperf driver ... not detected.
Executing: insmod ./socperf/src/socperf2_0-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
Creating /dev/socperf2_0 base devices with major number 245 ... done.
Setting group ownership of devices to group "vtune" ... done.
Setting file permissions on devices to "666" ... done.
The socperf2_0 driver has been successfully loaded.
Executing: insmod ./sep4_1-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko
Creating /dev/sep4_1 base devices with major number 244 ... done.
Creating /dev/sep4_1 percpu devices with major number 243 ... done.
Creating /dev/sep4_1 percpu devices with major number 242 ... done.
Creating /dev/sep4_1 per package devices with major number 241 ... done.
Setting group ownership of devices to group "vtune" ... done.
Setting file permissions on devices to "666" ... done.
The sep4_1 driver has been successfully loaded.
Checking for vtsspp driver ... not detected.
Executing: insmod ./vtsspp/vtsspp-x32_64-3.10.0-693.21.1.el7.x86_64smp.ko gid=1002 mode=0666
The vtsspp driver has been successfully loaded.

0 Kudos
PAVEL_G_Intel
Employee
3,366 Views

As result we have.

You have built driver with ./build-driver script
When driver loads to system it reports error:
insmod: ERROR: could not insert module ./sep{version}-{kernel-version}.x86_64smp.ko: Invalid parameters

In dmesg there are messages like:
[timestamp]sep{version} disagrees about version of symbol 'xxxxxx'

It's definitely a problem of kernel mismatch. You need to validate the kernel-headers you built with.

0 Kudos
randall__brent
Beginner
3,367 Views

Hi Pavel,

The remote capture works now. I got a capture that was even able to resolve kernel symbols. I think what I had done was mistakenly cross-compiled for a newer version of the kernel. Now I just need to work out how to get symbols to resolve properly for our containerization which causes the command name to be different from the filesystem location (using the "perf" tool you do this with --symfs=newroot).

0 Kudos
PAVEL_G_Intel
Employee
3,367 Views

Hi,

What type of container do you have? VTune Amplifier has support of docker and LXC. 

- Pavel

0 Kudos
randall__brent
Beginner
3,367 Views

Hi Pavel,

Thanks for the link to the doc. This is exactly what I needed. Depending on the environment we use Docker or Mesos containers. It's actually deployed by this wrapper system we have in place. To change the launch parameters I'll need to track down the specific team that owns that system and show them this documentation. I have a small test host on the side that I can at least test this with.

Thanks,

0 Kudos
Reply