Intel® QuickAssist Technology (Intel® QAT)
For questions and discussions related to Intel® QuickAssist Technology (Intel® QAT).
41 Discussions

Kernel Panic when we disable pcryt module in our kernel.

lpereira
New Contributor I
2,026 Views
After some tests, we have a Kernel Panic, when we disable pcryt module in our kernel.
 
We must not use pcrypt module because this issue: https://bugzilla.kernel.org/show_bug.cgi?id=217654
 
 
# lsmod | grep pcrypt
pcrypt                 16384  12
# lsmod | grep qat
qat_c3xxx              20480  1
intel_qat             303104  18 usdm_drv,qat_c3xxx
uio                    20480  1 intel_qat
authenc                16384  1 intel_qat
 
Linux  5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
 
Please, can you help us?
 
Details of kernel panic:
 
 

Please, can you help us?

Details of kernel panic:


[ 796.119373] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1H:29293]
[ 796.127078] Modules linked in: echainiv esp4 xt_addrtype ip_set_hash_net xt_NFLOG xt_devgroup xt_hashlimit xt_CT xt_REDIRECT xt_multiport xfrm_interface twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common ip6table_nat ip6table_mangle ip6table_raw ip6table_filter ip6_tables xt_MASQUERADE xt_conntrack xt_set ip_set_hash_ip ip_set serpent_sse2_x86_64 serpent_generic cast5_generic cast_common xt_connmark xt_mark xt_connlabel iptable_nat iptable_mangle iptable_raw des_generic libdes crypto_user camellia_generic camellia_x86_64 xcbc md4 iptable_filter nf_nat_ftp nf_conntrack_ftp nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp nf_nat ip_gre ip_tunnel gre tun macvlan qat_api(O) usdm_drv(O) nfnetlink_log nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink sunrpc sha512_ssse3 sha512_generic qat_c3xxx(O) pnd2_edac intel_qat(O) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
[ 796.127123] irqbypass iTCO_wdt rapl iTCO_vendor_support intel_cstate uio pcspkr i2c_i801 pinctrl_denverton authenc pinctrl_intel tpm_infineon i2c_ismt acpi_cpufreq tcp_htcp ip_tables ext4 mbcache jbd2 dm_crypt blowfish_generic blowfish_x86_64 blowfish_common mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd sdhci_pci cqhci sdhci ixgbe(O) mmc_core igb(O) dca ptp pps_core ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod fuse
[ 796.127154] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 796.127155] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 796.127174] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 796.127180] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 796.127183] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01
[ 796.127185] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 796.127188] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 796.127189] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 796.127191] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 796.127193] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 796.127194] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127197] FS: 0000000000000000(0000) GS:ffff915837b00000(0000) knlGS:0000000000000000
[ 796.127199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 796.127200] CR2: 00007f1958e5d0a0 CR3: 000000019c20a000 CR4: 00000000003406e0
[ 796.127202] Call Trace:
[ 796.127204] <IRQ>
[ 796.127208] _raw_spin_lock+0x1e/0x30
[ 796.127211] xfrm_input+0x1b0/0xa00
[ 796.127215] xfrm4_rcv+0x3b/0x40
[ 796.127218] xfrm4_esp_rcv+0x39/0x50
[ 796.127222] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 796.127226] ip_local_deliver_finish+0x48/0x50
[ 796.127229] ip_local_deliver+0xe5/0xf0
[ 796.127233] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 796.127236] ip_sublist_rcv_finish+0x5e/0x70
[ 796.127240] ip_sublist_rcv+0x219/0x2b0
[ 796.127244] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 796.127248] ip_list_rcv+0x134/0x160
[ 796.127252] __netif_receive_skb_list_core+0x28d/0x2b0
[ 796.127256] netif_receive_skb_list_internal+0x1d5/0x300
[ 796.127271] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 796.127275] gro_normal_list.part.0+0x1e/0x40
[ 796.127278] napi_complete_done+0x91/0x140
[ 796.127293] ixgbe_poll+0x413/0x650 [ixgbe]
[ 796.127297] net_rx_action+0x147/0x3b0
[ 796.127300] __do_softirq+0xe1/0x2d6
[ 796.127304] irq_exit+0xe5/0xf0
[ 796.127308] do_IRQ+0x5a/0xf0
[ 796.127311] common_interrupt+0xf/0xf
[ 796.127313] </IRQ>
[ 796.127317] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 796.127319] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 796.127321] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 796.127324] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 796.127325] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 796.127327] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 796.127329] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 796.127331] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127336] ? xfrm_replay_advance+0x52/0xc0
[ 796.127339] xfrm_input+0x559/0xa00
[ 796.127342] xfrm_input_resume+0x15/0x20
[ 796.127346] esp_input_done+0x21/0x30 [esp4]
[ 796.127366] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 796.127386] qat_alg_callback+0x22/0x30 [intel_qat]
[ 796.127403] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 796.127421] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 796.127424] process_one_work+0x1b5/0x370
[ 796.127428] worker_thread+0x50/0x3d0
[ 796.127432] kthread+0x106/0x140
[ 796.127434] ? process_one_work+0x370/0x370
[ 796.127437] ? kthread_park+0x90/0x90
[ 796.127441] ret_from_fork+0x35/0x40
[ 802.714196] rcu: INFO: rcu_sched self-detected stall on CPU
[ 802.714203] rcu: 2-....: (1 GPs behind) idle=be2/1/0x4000000000000004 softirq=144695/144696 fqs=14965
[ 802.714206] (t=60000 jiffies g=359009 q=147256)
[ 802.714209] NMI backtrace for cpu 2
[ 802.714213] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 802.714215] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 802.714235] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 802.714238] Call Trace:
[ 802.714240] <IRQ>
[ 802.714245] dump_stack+0x6d/0x8b
[ 802.714249] ? lapic_can_unplug_cpu+0x80/0x80
[ 802.714253] nmi_cpu_backtrace.cold+0x14/0x53
[ 802.714258] nmi_trigger_cpumask_backtrace+0xd9/0xe0
[ 802.714262] arch_trigger_cpumask_backtrace+0x19/0x20
[ 802.714266] rcu_dump_cpu_stacks+0x9c/0xce
[ 802.714270] rcu_sched_clock_irq.cold+0x1dc/0x3c4
[ 802.714276] update_process_times+0x2c/0x60
[ 802.714281] tick_sched_handle+0x29/0x60
[ 802.714284] tick_sched_timer+0x3d/0x80
[ 802.714287] __hrtimer_run_queues+0xf7/0x270
[ 802.714291] ? tick_sched_do_timer+0x70/0x70
[ 802.714294] hrtimer_interrupt+0x109/0x220
[ 802.714298] smp_apic_timer_interrupt+0x71/0x140
[ 802.714303] apic_timer_interrupt+0xf/0x20
[ 802.714308] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 802.714311] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01

[ 802.714313] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 802.714316] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 802.714317] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 802.714319] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 802.714321] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 802.714323] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714327] ? apic_timer_interrupt+0xa/0x20
[ 802.714332] _raw_spin_lock+0x1e/0x30
[ 802.714335] xfrm_input+0x1b0/0xa00
[ 802.714340] xfrm4_rcv+0x3b/0x40
[ 802.714344] xfrm4_esp_rcv+0x39/0x50
[ 802.714348] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 802.714352] ip_local_deliver_finish+0x48/0x50
[ 802.714355] ip_local_deliver+0xe5/0xf0
[ 802.714359] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 802.714363] ip_sublist_rcv_finish+0x5e/0x70
[ 802.714367] ip_sublist_rcv+0x219/0x2b0
[ 802.714372] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 802.714376] ip_list_rcv+0x134/0x160
[ 802.714380] __netif_receive_skb_list_core+0x28d/0x2b0
[ 802.714384] netif_receive_skb_list_internal+0x1d5/0x300
[ 802.714400] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 802.714405] gro_normal_list.part.0+0x1e/0x40
[ 802.714408] napi_complete_done+0x91/0x140
[ 802.714424] ixgbe_poll+0x413/0x650 [ixgbe]
[ 802.714428] net_rx_action+0x147/0x3b0
[ 802.714432] __do_softirq+0xe1/0x2d6
[ 802.714436] irq_exit+0xe5/0xf0
[ 802.714441] do_IRQ+0x5a/0xf0
[ 802.714444] common_interrupt+0xf/0xf
[ 802.714446] </IRQ>
[ 802.714450] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 802.714453] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 802.714455] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 802.714457] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 802.714459] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 802.714461] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 802.714463] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 802.714465] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714470] ? xfrm_replay_advance+0x52/0xc0
[ 802.714473] xfrm_input+0x559/0xa00
[ 802.714477] xfrm_input_resume+0x15/0x20
[ 802.714482] esp_input_done+0x21/0x30 [esp4]
[ 802.714503] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 802.714524] qat_alg_callback+0x22/0x30 [intel_qat]
[ 802.714543] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 802.714563] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 802.714567] process_one_work+0x1b5/0x370
[ 802.714570] worker_thread+0x50/0x3d0
[ 802.714574] kthread+0x106/0x140
[ 802.714577] ? process_one_work+0x370/0x370
[ 802.714580] ? kthread_park+0x90/0x90
[ 802.714584] ret_from_fork+0x35/0x40
[ 803.916248] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... } 60364 jiffies s: 1097 root: 0x4/.
[ 803.916261] rcu: blocking rcu_node structures:
[ 803.916265] Task dump for CPU 2:
[ 803.916269] kworker/2:1H R running task 0 29293 2 0x80004088
[ 803.916315] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 803.916319] Call Trace:
[ 803.916354] ? adf_handle_response+0x4b/0xd0 [intel_qat]
[ 803.916383] ? adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 803.916392] ? process_one_work+0x1b5/0x370
[ 803.916397] ? worker_thread+0x50/0x3d0
[ 803.916404] ? kthread+0x106/0x140
[ 803.916408] ? process_one_work+0x370/0x370
[ 803.916412] ? kthread_park+0x90/0x90
[ 803.916420] ? ret_from_fork+0x35/0x40

0 Kudos
25 Replies
Ronny_G_Intel
Moderator
258 Views

Hello Lucas,

 

Thanks for reporting the latest update regarding this issue.

Based on the scenarios that you reported, this is definitely an issue with the environment in which you are running QAT, with that in mind, is it possible for you to test a regular Linux distribution? CentOS 7* or Red Hat* downloaded from the official website? I am trying to rule out any particular customization done by ELRepo Project. 

Also, are you aware of any particular difference between having pcryto not enabled vs blacklisted?

 

Regards,

Ronny G

 

0 Kudos
Ronny_G_Intel
Moderator
218 Views

Hello Lucas,


Do you have any update by any chance? 

Did you have a chance to test an official CentOS 7* or Red Hat* downloaded from the official website?


Regards,

Ronny G


0 Kudos
lpereira
New Contributor I
200 Views

Dear Ronny,

Not yet, we will recompile your kernel 5.4.113 without pcrypt, in your ".configure".

 

However, due to some internal issues, it will take some time, I believe only in July.

 

Best Regards

Lucas Pereira

0 Kudos
Ronny_G_Intel
Moderator
194 Views

Hi Lucas,


Thanks for the update, if more testing is only going to be possible in July timeframe, can we close this case and reopen or create a new one whenever you have the opportunity for more debugging?


Regards,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
115 Views

Hi Lucas,


Thanks for replying back, I am going to close this case for now and we will resume debugging in July time frame if needed.


Regards,

Ronny G


0 Kudos
Reply