Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5016 Discussions

VTune sampling drivers cause kernel bug

hakostra1
New Contributor II
3,639 Views

I have a brand new workstation, with an i9-13900K CPU and ECC memory. I run Ubuntu 22.04.

I have installed oneAPI 2023.0 and followed the guide for the VTune sampling driver installation:

https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/build-install-sampling-drivers-for-linux-targets.html

When I load the kernel driver and run VTune, the kernel crash. It's really unfortunate and inconvenient. So far I have not been able to perform a single analysis with the kernel driver without a crash. When I run VTune without the driver it works just fine.

Here is a sample from /var/log/kern.log for one particular event:

 

Jan 30 15:09:31 kmt-trd2 kernel: [ 4514.827648] pax: module verification failed: signature and/or required key missing - tainting kernel
Jan 30 15:09:31 kmt-trd2 kernel: [ 4514.828706] PAX: PMU arbitration service v1.0.2 has been started.
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918099] socperf3_0: SocPerf Driver loading...
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918105] socperf3_0: SocPerf Driver about to register chrdev...
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918108] socperf3_0: SocPerf Driver: result of alloc_chrdev_region is 0
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918111] socperf3_0: SocPerf Driver: major number is 507
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918114] socperf3_0: SocPerf Driver: detected 32 CPUs in lwpmudrv_Load
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918116] socperf3_0: SocPerf Driver: creating device socperf3!c...
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918410] socperf3_0: PMU check enabled! F6.Mb7.S1 index=-1
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918416] socperf3_0: No MMIO list information detected!
Jan 30 15:09:32 kmt-trd2 kernel: [ 4515.918418] socperf3_0: SocPerf Driver v3.0.0 has been loaded.
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.004134] sep5_38: Driver loading... sym_lookup_func_addr=ffffffffb77ba7f0
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.004430] sep5_38: [load] [UTILITY_Driver_Log_Init@1132]: Initialized driver log using contiguous physical memory.
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.004432] sep5_38: [load] [lwpmu_Load@7483]: Major number is 506
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.004433] sep5_38: [load] [lwpmu_Load@7491]: Detected 32 total CPUs and 32 active CPUs.
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005829] sep5_38: [warning] [lwpmudrv_Detect_PMT_Endpoints@6605]: Address of PMT function is invalid
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005829] 
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005831] sep5_38: [load] [lwpmu_Load@7882]: PMU collection driver v5.38.13 Beta has been loaded.
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005832] sep5_38: [load] [lwpmu_Load@7892]: NMI will be used for handling PMU interrupts.
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005834] sep5_38: [load] [PMU_LIST_Initialize@644]: PMU check enabled! F6.Mb7.S1 index=48 drv_type=PUBLIC arch_pmu_info_used=no
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005834] 
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005844] sep5_38: [load] [PMU_LIST_Build_PCI_List@709]: No PCI list information detected!
Jan 30 15:09:34 kmt-trd2 kernel: [ 4517.005844] 
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132986] vtsspp: Driver version: 1.8.380-624757
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132988] vtsspp: Driver options: uid: 0, gid: 1000, mode: 660
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132989] vtsspp: Kernel version: 5.19.0-28-generic
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132990] vtsspp: Detected 32 CPU(s) and 2 thread(s) per core
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132991] vtsspp: CPU family: 0x06, model: 0xb7, stepping: 01
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132991] vtsspp: CPU freq: 2995200KHz, timer freq: 1000000KHz
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.132998] vtsspp: CPU hybrid mode detected
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133000] vtsspp: Driver options: ksyms: ffffffffb77ba7f0
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133002] vtsspp: PERFMONv5: fixed counters: 4, general counters: 8
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133002] vtsspp: PERFMONv5: fixed counters: 3, general counters: 6
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133429] vtsspp: Kernel: KPTI detected
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133429] vtsspp: Kernel: KASLR detected
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.133429] vtsspp: Driver has been loaded
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.140454] socwatch2_15: -----------------------------------------
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.140455] socwatch2_15: OK: LOADED SoC Watch Driver
Jan 30 15:09:35 kmt-trd2 kernel: [ 4518.140456] socwatch2_15: -----------------------------------------
Jan 30 15:11:24 kmt-trd2 kernel: [ 4627.599622] ptrace attach of "mglet_gc_acoustic"[20690] was attempted by "/home/hakostra/KMT/mglet-8.5/bin/mglet_gc_acoustic"[20749]
Jan 30 15:11:24 kmt-trd2 kernel: [ 4627.599623] ptrace attach of "mglet_gc_acoustic"[20691] was attempted by "mglet_gc_acoustic"[20690]
Jan 30 15:11:24 kmt-trd2 kernel: [ 4627.599624] ptrace attach of "mglet_gc_acoustic"[20692] was attempted by "mglet_gc_acoustic"[20691]
Jan 30 15:11:24 kmt-trd2 kernel: [ 4627.599628] ptrace attach of "/home/hakostra/KMT/mglet-8.5/bin/mglet_gc_acoustic"[20749] was attempted by "mglet_gc_acoustic"[20696]
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277638] BUG: scheduling while atomic: DOM Worker/12984/0x00000002
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277646] Modules linked in: socwatch2_15(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) ib_core xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay sunrpc binfmt_misc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic ledtrig_audio snd_compress snd_hda_codec_hdmi ac97_bus intel_rapl_msr snd_pcm_dmaengine intel_rapl_common intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal sch_fq_codel snd_intel_dspcfg intel_powerclamp snd_intel_sdw_acpi snd_hda_codec snd_usb_audio uvcvideo
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277699]  snd_hda_core snd_usbmidi_lib nct6775 videobuf2_vmalloc kvm_intel snd_hwdep nct6775_core snd_seq_midi videobuf2_memops hwmon_vid snd_seq_midi_event videobuf2_v4l2 snd_rawmidi coretemp videobuf2_common mei_hdcp mei_pxp kvm snd_seq cmdlinepart videodev snd_pcm spi_nor joydev snd_seq_device msr nls_iso8859_1 intel_cstate eeepc_wmi wmi_bmof mtd mc snd_timer input_leds mei_me snd mei soundcore mac_hid acpi_pad acpi_tad parport_pc ppdev lp parport pstore_blk ramoops reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_mirror dm_region_hash dm_log hid_logitech_hidpp hid_logitech_dj r8153_ecm cdc_ether usbnet r8152 mii hid_jabra hid_generic usbhid uas hid usb_storage amdgpu iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_display_helper cec rc_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel syscopyarea mfd_aaeon sysfillrect asus_wmi sysimgblt fb_sys_fops sparse_keymap crypto_simd
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277757]  platform_profile cryptd i40e igc ahci nvme drm spi_intel_pci i2c_i801 spi_intel xhci_pci i2c_smbus intel_lpss_pci libahci nvme_core intel_lpss xhci_pci_renesas idma64 vmd wmi video pinctrl_alderlake
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277770] CPU: 24 PID: 12984 Comm: DOM Worker Tainted: P           OE     5.19.0-28-generic #29~22.04.1-Ubuntu
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277772] Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE, BIOS 0203 11/15/2022
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277773] Call Trace:
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277775]  <TASK>
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277778]  show_stack+0x4e/0x61
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277783]  dump_stack_lvl+0x4a/0x6f
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277786]  dump_stack+0x10/0x18
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277787]  __schedule_bug.cold+0x4f/0x6b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277791]  __schedule+0x451/0x5f0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277793]  schedule+0x63/0x110
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277795]  rwsem_down_read_slowpath+0x3a1/0x4c0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277797]  down_read+0x41/0xa0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277800]  UTILITY_down_read_mm+0x12/0x20 [sep5]
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277810]  linuxos_Exec_Unmap_Notify+0xbd/0x180 [sep5]
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277816]  ? __vm_munmap+0x1/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277820]  kprobe_ftrace_handler+0x113/0x1e0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277822]  ? __vm_munmap+0x5/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277824]  0xffffffffc09a30e3
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277826] RIP: 0010:__vm_munmap+0x1/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277828] Code: e8 24 0b 11 00 4c 8b 4d d0 4c 8b 55 b8 85 c0 41 89 c7 0f 84 33 fe ff ff e9 79 fd ff ff 41 bf f4 ff ff ff e9 6e fd ff ff 90 e8 <fb> 19 05 09 55 48 89 e5 41 57 41 56 41 89 d6 41 55 49 89 f5 41 54
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277829] RSP: 0018:ffffb9d703b5bf18 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277831] RAX: ffffffffb7951770 RBX: ffffb9d703b5bf58 RCX: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277832] RDX: 0000000000000001 RSI: 0000000000004000 RDI: 00007fb876cc5000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277833] RBP: ffffb9d703b5bf20 R08: 0000000000000000 R09: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277834] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277834] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277837]  ? vm_munmap+0x20/0x20
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277839]  ? __vm_munmap+0x5/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277840]  ? __x64_sys_munmap+0x1b/0x30
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277841]  ? __vm_munmap+0x5/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277842]  ? __x64_sys_munmap+0x1b/0x30
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277843]  do_syscall_64+0x58/0x90
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277845]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277847] RIP: 0033:0x7fb88d91ec2b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277848] Code: 8b 15 09 a2 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a1 0f 00 f7 d8 64 89 01 48
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277849] RSP: 002b:00007fb86cefed48 EFLAGS: 00000217 ORIG_RAX: 000000000000000b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277850] RAX: ffffffffffffffda RBX: 00007fb876cc5000 RCX: 00007fb88d91ec2b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277851] RDX: 00007fb86ceff640 RSI: 0000000000004000 RDI: 00007fb876cc5000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277852] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277852] R10: 0000000000000000 R11: 0000000000000217 R12: 00007fb88dbe36f0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277853] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fb86cefed80
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277855]  </TASK>
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277868] BUG: scheduling while atomic: DOM Worker/12984/0x00000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277870] Modules linked in: socwatch2_15(OE) vtsspp(OE) sep5(OE) socperf3(OE) pax(OE) ib_core xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay sunrpc binfmt_misc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic ledtrig_audio snd_compress snd_hda_codec_hdmi ac97_bus intel_rapl_msr snd_pcm_dmaengine intel_rapl_common intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal sch_fq_codel snd_intel_dspcfg intel_powerclamp snd_intel_sdw_acpi snd_hda_codec snd_usb_audio uvcvideo
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277895]  snd_hda_core snd_usbmidi_lib nct6775 videobuf2_vmalloc kvm_intel snd_hwdep nct6775_core snd_seq_midi videobuf2_memops hwmon_vid snd_seq_midi_event videobuf2_v4l2 snd_rawmidi coretemp videobuf2_common mei_hdcp mei_pxp kvm snd_seq cmdlinepart videodev snd_pcm spi_nor joydev snd_seq_device msr nls_iso8859_1 intel_cstate eeepc_wmi wmi_bmof mtd mc snd_timer input_leds mei_me snd mei soundcore mac_hid acpi_pad acpi_tad parport_pc ppdev lp parport pstore_blk ramoops reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_mirror dm_region_hash dm_log hid_logitech_hidpp hid_logitech_dj r8153_ecm cdc_ether usbnet r8152 mii hid_jabra hid_generic usbhid uas hid usb_storage amdgpu iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_display_helper cec rc_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel syscopyarea mfd_aaeon sysfillrect asus_wmi sysimgblt fb_sys_fops sparse_keymap crypto_simd
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277925]  platform_profile cryptd i40e igc ahci nvme drm spi_intel_pci i2c_i801 spi_intel xhci_pci i2c_smbus intel_lpss_pci libahci nvme_core intel_lpss xhci_pci_renesas idma64 vmd wmi video pinctrl_alderlake
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277932] CPU: 24 PID: 12984 Comm: DOM Worker Tainted: P        W  OE     5.19.0-28-generic #29~22.04.1-Ubuntu
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277933] Hardware name: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE, BIOS 0203 11/15/2022
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277934] Call Trace:
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277935]  <TASK>
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277935]  show_stack+0x4e/0x61
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277936]  dump_stack_lvl+0x4a/0x6f
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277938]  dump_stack+0x10/0x18
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277939]  __schedule_bug.cold+0x4f/0x6b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277940]  __schedule+0x451/0x5f0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277942]  schedule+0x63/0x110
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277943]  rwsem_down_write_slowpath+0x2f7/0x5b0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277944]  ? kprobe_ftrace_handler+0x113/0x1e0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277946]  ? __vm_munmap+0x5/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277947]  down_write_killable+0x4c/0x60
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277948]  __vm_munmap+0x5c/0x150
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277950]  __x64_sys_munmap+0x1b/0x30
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277951]  do_syscall_64+0x58/0x90
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277952]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277954] RIP: 0033:0x7fb88d91ec2b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277954] Code: 8b 15 09 a2 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 a1 0f 00 f7 d8 64 89 01 48
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277955] RSP: 002b:00007fb86cefed48 EFLAGS: 00000217 ORIG_RAX: 000000000000000b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277956] RAX: ffffffffffffffda RBX: 00007fb876cc5000 RCX: 00007fb88d91ec2b
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277957] RDX: 00007fb86ceff640 RSI: 0000000000004000 RDI: 00007fb876cc5000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277958] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277958] R10: 0000000000000000 R11: 0000000000000217 R12: 00007fb88dbe36f0
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277959] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fb86cefed80
Jan 30 15:12:06 kmt-trd2 kernel: [ 4669.277960]  </TASK>

 

The second kernel log is bigger, and cannot be pasted in the post here. The file upload function does not work, therefore I uploaded it here: https://gist.github.com/hakostra/96e3ed4e095d998ca89f8edc7e3fb9e3

 

Any ideas?

 

 

0 Kudos
12 Replies
AthiraM_Intel
Moderator
3,554 Views

Hi,


Thank you for posting in Intel Communities.


Could you please share the output of the self-checker logs. This can be obtained by running the below command:


<Vtune_installation_directory\2023.0.0\bin64\vtune-self-checker.sh>


  example: /opt/intel/oneapi/vtune/2023.0.0/bin64/vtune-self-checker.sh




Thanks


0 Kudos
hakostra1
New Contributor II
3,531 Views

I loaded the kernel drivers first and then ran the self-checker tool. Here is the terminal output:

Intel(R) VTune(TM) Profiler Self Check Utility
Copyright (C) 2009 Intel Corporation. All rights reserved.
Build Number: 624757

HW event-based analysis (counting mode) (Intel driver)   
Example of analysis types: Performance Snapshot
    Collection: Ok
vtune: Warning: EMON Collector Message: Event TOPDOWN.SLOTS:perf_metrics discarded since the event is invalid or the device does not exist.
    Finalization: Ok...
    Report: Ok

Instrumentation based analysis check   
Example of analysis types: Hotspots and Threading with user-mode sampling
    Collection: Ok
vtune: Warning: EMON Collector Message: Event TOPDOWN.SLOTS:perf_metrics discarded since the event is invalid or the device does not exist.
    Finalization: Ok...
    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Cannot read load addresses of sections from `/sys/module/nf_conntrack/sections'. This may affect the correctness of symbol resolution for `nf_conntrack'. Make sure this directory exists and all files in this directory have read permissions.
vtune: Warning: Cannot read load addresses of sections from `/sys/module/i40e/sections'. This may affect the correctness of symbol resolution for `i40e'. Make sure this directory exists and all files in this directory have read permissions.

vtune: Warning: Cannot read load addresses of sections from `/sys/module/kvm/sections'. This may affect the correctness of symbol resolution for `kvm'. Make sure this directory exists and all files in this directory have read permissions.

vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

vtune: Warning: Cannot read load addresses of sections from `/sys/module/amdgpu/sections'. This may affect the correctness of symbol resolution for `amdgpu'. Make sure this directory exists and all files in this directory have read permissions.

    Report: Ok

HW event-based analysis check (Intel driver)   
Example of analysis types: Microarchitecture Exploration
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with uncore events (Intel driver)   
Example of analysis types: Memory Access
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with stacks (Intel driver)   
Example of analysis types: Hotspots with HW event-based sampling and call stacks
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

HW event-based analysis with context switches (Intel driver)   
Example of analysis types: Threading with HW event-based sampling
    Collection: Ok
vtune: Warning: To enable hardware event-based sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
    Finalization: Ok...
vtune: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

    Report: Ok

Checking DPC++ application as prerequisite for GPU analyses: Fail
Unable to run DPC++ application on GPU connected to this system. If you are using an Intel GPU and want to verify profiling support for DPC++ applications, check these requirements:
* Install Intel(R) GPU driver.
* Install Intel(R) Level Zero GPU runtime.
* Install Intel(R) oneAPI DPC++ Runtime and set the environment.

The system is ready to be used for performance analysis with Intel VTune Profiler.
Review warnings in the output above to find product limitations, if any.

The system is ready for the following analyses:
* Performance Snapshot
* Hotspots and Threading with user-mode sampling
* Hotspots with HW event-based sampling, HPC Performance Characterization, etc.
* Microarchitecture Exploration
* Memory Access
* Hotspots with HW event-based sampling and call stacks
* Threading with HW event-based sampling

The following analyses have failed on the system:
* GPU Compute/Media Hotspots (characterization mode)
* GPU Compute/Media Hotspots (source analysis mode)

 The 'log.txt' is attached.

0 Kudos
AthiraM_Intel
Moderator
3,505 Views

Hi,


We could see that all the drivers are loaded correctly in your selfchecker log file. Also the analysis are working fine. 


Could you please let us know, when you are getting the kernel crash issue?


Please share the exact steps to reproduce the issue from our end.



Thanks


0 Kudos
AthiraM_Intel
Moderator
3,470 Views

Hi,


We have not heard back from you. Could you please give us an update?



Thanks


0 Kudos
hakostra1
New Contributor II
3,446 Views
  1. I load the kernel drivers ./insmod-sep -g hakostra
  2. Run vtune with mpirun and try for instance the hpc-performance option: mpirun -n 8 vtune -collect hpc-performance -knob collect-memory-bandwidth=true -data-limit=10000 -finalization-mode=full -r $PWD/hpc-performance -- mglet_cc_acoustic

where mglet_cc_acoustic is my application.

 

I made a screen recording with my phone, where I take my freshly rebooted computer (uptime 5 min) and perform the two actions above:

https://www.jottacloud.com/s/2575e0d5ba23d584cf49915d5c93acdda01

 

Vtune starts, my simulation runs for a few seconds until the kernel crash. Sometimes the computer reboots itself, that did not happen at the time I recorded this event. In the end I had to power cycle it.

 

I also attach the kernel log for the particular event of the screen capture.

0 Kudos
hakostra1
New Contributor II
3,446 Views

The log was not attached properly, I try again. If it does not show up, here is a link: https://www.jottacloud.com/s/25786e25979a17b4670a4643ae2df4276c4

0 Kudos
AthiraM_Intel
Moderator
3,391 Views

Hi,


Could you please let us know whether your program is running successfully without using VTune?


Also share the sample reproducer code and the exact steps you followed so that we can reproduce the same from our end.



Thanks


0 Kudos
AthiraM_Intel
Moderator
3,273 Views

Hi,


We have not heard back from you. Could you please give us an update?



Thanks


0 Kudos
hakostra1
New Contributor II
3,230 Views

My applications runs fine without the kernel drivers. I can even run it in Vtune with the default non-kernel-driver based sampling methods just fine.

The problem only appear when I use the kernel drivers and VTune. I have not experienced kernel crashes when the drivers are loaded, but Vtune has not been running.

I have no reproducer for you right now. I could potentially try a generic application like "perf", but I am reluctant to crashing my kernel so many times. I'm always concerned with filesystem integrity etc. afterwards. So I haven't tried anything since my last post here.

However, I want to emphasize that the Linux kernel should not crash due to userspace application bugs (i.e. potential bugs in my application). If the kernel crash, for whatever reason, even triggered by a userspace application, there is also kernel bug present. Think how big of a problem it would be for all kinds of shared systems if a random application could bring the kernel to a stop, the entire internet would break down. So kernel crash == kernel bug in absence of potential hardware issues.

I realize that you might not get this solved right now, but at least you are aware and other users can find this thread of they encounter something similar.

Thanks for your effort.

0 Kudos
AthiraM_Intel
Moderator
3,066 Views

Hi,


We are checking on this internally, will get back to you soon with an update.



Thanks


0 Kudos
gbaraldi1
Beginner
2,445 Views

Hi,

Do you have any updates on this?
I see very similar behaviour to what was reported by hakostra

 

Thanks!

0 Kudos
AthiraM_Intel
Moderator
1,843 Views

Hi,


Thank you for your patience. The issue raised by you have been fixed in VTune 2024.0 version. Please download and let us know if this resolves your issue. We will be closing this thread from our side. If the issue still persists with new release then create a new thread for us to investigate. 



Thanks


0 Kudos
Reply