Intel® QuickAssist Technology (Intel® QAT)
For questions and discussions related to Intel® QuickAssist Technology (Intel® QAT).
41 Discussions

Kernel Panic when we disable pcryt module in our kernel.

lpereira
New Contributor I
1,849 Views
After some tests, we have a Kernel Panic, when we disable pcryt module in our kernel.
 
We must not use pcrypt module because this issue: https://bugzilla.kernel.org/show_bug.cgi?id=217654
 
 
# lsmod | grep pcrypt
pcrypt                 16384  12
# lsmod | grep qat
qat_c3xxx              20480  1
intel_qat             303104  18 usdm_drv,qat_c3xxx
uio                    20480  1 intel_qat
authenc                16384  1 intel_qat
 
Linux  5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
 
Please, can you help us?
 
Details of kernel panic:
 
 

Please, can you help us?

Details of kernel panic:


[ 796.119373] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1H:29293]
[ 796.127078] Modules linked in: echainiv esp4 xt_addrtype ip_set_hash_net xt_NFLOG xt_devgroup xt_hashlimit xt_CT xt_REDIRECT xt_multiport xfrm_interface twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common ip6table_nat ip6table_mangle ip6table_raw ip6table_filter ip6_tables xt_MASQUERADE xt_conntrack xt_set ip_set_hash_ip ip_set serpent_sse2_x86_64 serpent_generic cast5_generic cast_common xt_connmark xt_mark xt_connlabel iptable_nat iptable_mangle iptable_raw des_generic libdes crypto_user camellia_generic camellia_x86_64 xcbc md4 iptable_filter nf_nat_ftp nf_conntrack_ftp nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp nf_nat ip_gre ip_tunnel gre tun macvlan qat_api(O) usdm_drv(O) nfnetlink_log nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink sunrpc sha512_ssse3 sha512_generic qat_c3xxx(O) pnd2_edac intel_qat(O) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
[ 796.127123] irqbypass iTCO_wdt rapl iTCO_vendor_support intel_cstate uio pcspkr i2c_i801 pinctrl_denverton authenc pinctrl_intel tpm_infineon i2c_ismt acpi_cpufreq tcp_htcp ip_tables ext4 mbcache jbd2 dm_crypt blowfish_generic blowfish_x86_64 blowfish_common mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd sdhci_pci cqhci sdhci ixgbe(O) mmc_core igb(O) dca ptp pps_core ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod fuse
[ 796.127154] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 796.127155] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 796.127174] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 796.127180] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 796.127183] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01
[ 796.127185] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 796.127188] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 796.127189] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 796.127191] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 796.127193] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 796.127194] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127197] FS: 0000000000000000(0000) GS:ffff915837b00000(0000) knlGS:0000000000000000
[ 796.127199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 796.127200] CR2: 00007f1958e5d0a0 CR3: 000000019c20a000 CR4: 00000000003406e0
[ 796.127202] Call Trace:
[ 796.127204] <IRQ>
[ 796.127208] _raw_spin_lock+0x1e/0x30
[ 796.127211] xfrm_input+0x1b0/0xa00
[ 796.127215] xfrm4_rcv+0x3b/0x40
[ 796.127218] xfrm4_esp_rcv+0x39/0x50
[ 796.127222] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 796.127226] ip_local_deliver_finish+0x48/0x50
[ 796.127229] ip_local_deliver+0xe5/0xf0
[ 796.127233] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 796.127236] ip_sublist_rcv_finish+0x5e/0x70
[ 796.127240] ip_sublist_rcv+0x219/0x2b0
[ 796.127244] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 796.127248] ip_list_rcv+0x134/0x160
[ 796.127252] __netif_receive_skb_list_core+0x28d/0x2b0
[ 796.127256] netif_receive_skb_list_internal+0x1d5/0x300
[ 796.127271] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 796.127275] gro_normal_list.part.0+0x1e/0x40
[ 796.127278] napi_complete_done+0x91/0x140
[ 796.127293] ixgbe_poll+0x413/0x650 [ixgbe]
[ 796.127297] net_rx_action+0x147/0x3b0
[ 796.127300] __do_softirq+0xe1/0x2d6
[ 796.127304] irq_exit+0xe5/0xf0
[ 796.127308] do_IRQ+0x5a/0xf0
[ 796.127311] common_interrupt+0xf/0xf
[ 796.127313] </IRQ>
[ 796.127317] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 796.127319] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 796.127321] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 796.127324] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 796.127325] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 796.127327] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 796.127329] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 796.127331] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127336] ? xfrm_replay_advance+0x52/0xc0
[ 796.127339] xfrm_input+0x559/0xa00
[ 796.127342] xfrm_input_resume+0x15/0x20
[ 796.127346] esp_input_done+0x21/0x30 [esp4]
[ 796.127366] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 796.127386] qat_alg_callback+0x22/0x30 [intel_qat]
[ 796.127403] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 796.127421] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 796.127424] process_one_work+0x1b5/0x370
[ 796.127428] worker_thread+0x50/0x3d0
[ 796.127432] kthread+0x106/0x140
[ 796.127434] ? process_one_work+0x370/0x370
[ 796.127437] ? kthread_park+0x90/0x90
[ 796.127441] ret_from_fork+0x35/0x40
[ 802.714196] rcu: INFO: rcu_sched self-detected stall on CPU
[ 802.714203] rcu: 2-....: (1 GPs behind) idle=be2/1/0x4000000000000004 softirq=144695/144696 fqs=14965
[ 802.714206] (t=60000 jiffies g=359009 q=147256)
[ 802.714209] NMI backtrace for cpu 2
[ 802.714213] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 802.714215] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 802.714235] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 802.714238] Call Trace:
[ 802.714240] <IRQ>
[ 802.714245] dump_stack+0x6d/0x8b
[ 802.714249] ? lapic_can_unplug_cpu+0x80/0x80
[ 802.714253] nmi_cpu_backtrace.cold+0x14/0x53
[ 802.714258] nmi_trigger_cpumask_backtrace+0xd9/0xe0
[ 802.714262] arch_trigger_cpumask_backtrace+0x19/0x20
[ 802.714266] rcu_dump_cpu_stacks+0x9c/0xce
[ 802.714270] rcu_sched_clock_irq.cold+0x1dc/0x3c4
[ 802.714276] update_process_times+0x2c/0x60
[ 802.714281] tick_sched_handle+0x29/0x60
[ 802.714284] tick_sched_timer+0x3d/0x80
[ 802.714287] __hrtimer_run_queues+0xf7/0x270
[ 802.714291] ? tick_sched_do_timer+0x70/0x70
[ 802.714294] hrtimer_interrupt+0x109/0x220
[ 802.714298] smp_apic_timer_interrupt+0x71/0x140
[ 802.714303] apic_timer_interrupt+0xf/0x20
[ 802.714308] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 802.714311] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01

[ 802.714313] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 802.714316] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 802.714317] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 802.714319] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 802.714321] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 802.714323] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714327] ? apic_timer_interrupt+0xa/0x20
[ 802.714332] _raw_spin_lock+0x1e/0x30
[ 802.714335] xfrm_input+0x1b0/0xa00
[ 802.714340] xfrm4_rcv+0x3b/0x40
[ 802.714344] xfrm4_esp_rcv+0x39/0x50
[ 802.714348] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 802.714352] ip_local_deliver_finish+0x48/0x50
[ 802.714355] ip_local_deliver+0xe5/0xf0
[ 802.714359] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 802.714363] ip_sublist_rcv_finish+0x5e/0x70
[ 802.714367] ip_sublist_rcv+0x219/0x2b0
[ 802.714372] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 802.714376] ip_list_rcv+0x134/0x160
[ 802.714380] __netif_receive_skb_list_core+0x28d/0x2b0
[ 802.714384] netif_receive_skb_list_internal+0x1d5/0x300
[ 802.714400] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 802.714405] gro_normal_list.part.0+0x1e/0x40
[ 802.714408] napi_complete_done+0x91/0x140
[ 802.714424] ixgbe_poll+0x413/0x650 [ixgbe]
[ 802.714428] net_rx_action+0x147/0x3b0
[ 802.714432] __do_softirq+0xe1/0x2d6
[ 802.714436] irq_exit+0xe5/0xf0
[ 802.714441] do_IRQ+0x5a/0xf0
[ 802.714444] common_interrupt+0xf/0xf
[ 802.714446] </IRQ>
[ 802.714450] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 802.714453] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 802.714455] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 802.714457] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 802.714459] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 802.714461] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 802.714463] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 802.714465] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714470] ? xfrm_replay_advance+0x52/0xc0
[ 802.714473] xfrm_input+0x559/0xa00
[ 802.714477] xfrm_input_resume+0x15/0x20
[ 802.714482] esp_input_done+0x21/0x30 [esp4]
[ 802.714503] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 802.714524] qat_alg_callback+0x22/0x30 [intel_qat]
[ 802.714543] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 802.714563] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 802.714567] process_one_work+0x1b5/0x370
[ 802.714570] worker_thread+0x50/0x3d0
[ 802.714574] kthread+0x106/0x140
[ 802.714577] ? process_one_work+0x370/0x370
[ 802.714580] ? kthread_park+0x90/0x90
[ 802.714584] ret_from_fork+0x35/0x40
[ 803.916248] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... } 60364 jiffies s: 1097 root: 0x4/.
[ 803.916261] rcu: blocking rcu_node structures:
[ 803.916265] Task dump for CPU 2:
[ 803.916269] kworker/2:1H R running task 0 29293 2 0x80004088
[ 803.916315] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 803.916319] Call Trace:
[ 803.916354] ? adf_handle_response+0x4b/0xd0 [intel_qat]
[ 803.916383] ? adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 803.916392] ? process_one_work+0x1b5/0x370
[ 803.916397] ? worker_thread+0x50/0x3d0
[ 803.916404] ? kthread+0x106/0x140
[ 803.916408] ? process_one_work+0x370/0x370
[ 803.916412] ? kthread_park+0x90/0x90
[ 803.916420] ? ret_from_fork+0x35/0x40

0 Kudos
25 Replies
Ronny_G_Intel
Moderator
1,557 Views

Hi lpereira,

 

Thanks for reaching out to Intel Communities.

I see that you are reporting a Kernel Panic when you disable pcryt module in your kernel.

Can you please provide more details so that can understand and troubleshoot this issue? 

When is the kernel panic happening?

What hardware and operating system are you using? What is the QAT driver version?

Can you please provide the necessary steps to replicate this issue? 

Are you integrating QAT with StrongSwan? 

 

Thanks,

Ronny G

0 Kudos
lpereira
New Contributor I
1,525 Views

Dear Ronny,

 

Reply in line.

 

Can you please provide more details so that can understand and troubleshoot this issue? 

I have 2 network appliances:

Model: https://www.silicom-usa.com/pr/x86-open-appliances/networking-appliances/ucpe-madrid-desktop/

CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz


Linux lucas-node2.blockbit.com 5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@lucas-node2 ~]# lsmod | grep qat
qat_api 573440 0
qat_c3xxx 20480 1
intel_qat 303104 13 qat_api,usdm_drv,qat_c3xxx
uio 20480 1 intel_qat
authenc 16384 1 intel_qat


[root@lucas-node2 ~]# lsmod | grep pcry
pcrypt 16384 6


[root@lucas-node2 crypto]# systemctl status qat
● qat.service - QAT service
Loaded: loaded (/usr/lib/systemd/system/qat.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2024-03-22 11:57:30 -03; 32s ago
Process: 31581 ExecStop=/etc/init.d/qat_service shutdown (code=exited, status=0/SUCCESS)
Process: 31639 ExecStart=/etc/init.d/qat_service start (code=exited, status=0/SUCCESS)
Main PID: 31639 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/qat.service

Mar 22 11:57:29 lucas-node2.blockbit.com systemd[1]: Starting QAT service...
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Restarting all devices.
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Processing /etc/c3xxx_dev0.conf
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Checking status of all devices.
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: There is 1 QAT acceleration device(s) in the system:
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: qat_dev0 - type: c3xxx, inst_id: 0, node_id: 0, bsf: 0000:01:00.0, #accel: 3 #engines: 6 state: up
Mar 22 11:57:30 lucas-node2.blockbit.com systemd[1]: Started QAT service.


[root@lucas-node2 ~]# strongswan version
Linux strongSwan U5.9.11/K5.4.113-1.el7.elrepo.x86_64
University of Applied Sciences Rapperswil, Switzerland


Security Associations (3 up, 0 connecting):
tun3[9]: ESTABLISHED 5 hours ago, 100.100.100.2[100.100.100.2]...100.100.100.1[100.100.100.1]
tun3[9]: IKEv2 SPIs: 0d4e10d9168ca799_i bd6272d6659970f5_r*, rekeying in 18 hours
tun3[9]: IKE proposal: AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024
tun3{21}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: cb52ee7d_i c3d03cdd_o
tun3{21}: AES_CBC_128/HMAC_SHA2_256_128/MODP_768, 0 bytes_i, 0 bytes_o, rekeying in 2 hours
tun3{21}: 172.25.0.0/24 === 172.46.0.0/24
tun2[7]: ESTABLISHED 5 hours ago, 200.200.200.2[200.200.200.2]...200.200.200.1[200.200.200.1]
tun2[7]: IKEv2 SPIs: 55ab6381d87ac2c3_i 09f8556a5d6eec63_r*, rekeying in 18 hours
tun2[7]: IKE proposal: AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024
tun2{19}: INSTALLED, TUNNEL, reqid 2, ESP SPIs: c75a891a_i ce7d4cf0_o
tun2{19}: AES_CBC_128/HMAC_SHA2_256_128/MODP_768, 0 bytes_i, 0 bytes_o, rekeying in 2 hours
tun2{19}: 172.35.0.0/24 === 172.36.0.0/24


I have 2 IPSEC TUNNELs.

IP address

LAN <172.36.0.0/24> -- [Applaince 1] -- <200.200.200.1> -- <200.200.200.2> -- [Applaince 2] -- LAN <172.35.0.0/24>
LAN <172.46.0.0/24> -- [Applaince 1] -- <100.100.100.1> -- <100.100.100.2> -- [Applaince 2] -- LAN <172.25.0.0/24>

------

When is the kernel panic happening?

 

Node 1 we have the iperf3 Server:

iperf3 -s -p 5201 -D >/dev/null 2>&1
iperf3 -s -p 5202 -D >/dev/null 2>&1

Node 2:

iperf3 -c 172.46.0.1 -p 5202 -t 120 &
iperf3 -c 172.36.0.1 -p 5203 -t 120 &

PS: we are using the iperf3 inside 2 network applainces.

[root@lucas-node2 ~]# sh iperf_client.sh
[root@lucas-node2 ~]# Connecting to host 172.46.0.1, port 5202
Connecting to host 172.36.0.1, port 5203
[ 4] local 172.25.0.3 port 59186 connected to 172.46.0.1 port 5202
[ 4] local 172.35.0.1 port 54274 connected to 172.36.0.1 port 5203
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 92.3 MBytes 772 Mbits/sec 254 321 KBytes
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.01 sec 76.1 MBytes 630 Mbits/sec 172 387 KBytes
[ 4] 1.00-2.00 sec 90.0 MBytes 757 Mbits/sec 0 433 KBytes
...
[ 4] 118.00-119.00 sec 80.0 MBytes 670 Mbits/sec 83 441 KBytes
[ 4] 119.00-120.00 sec 88.8 MBytes 744 Mbits/sec 0 517 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-120.00 sec 10.2 GBytes 728 Mbits/sec 4087 sender
[ 4] 0.00-120.00 sec 10.2 GBytes 728 Mbits/sec receiver

iperf Done.
[ 4] 119.00-120.01 sec 70.0 MBytes 583 Mbits/sec 0 587 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-120.01 sec 9.23 GBytes 661 Mbits/sec 4296 sender
[ 4] 0.00-120.01 sec 9.23 GBytes 661 Mbits/sec receiver

but if we are QAT without pcrypt causes panic after some seconds of traffic.

After start lot of traffic, I am using the iperf3 to send packets overlay IPSEC tunnels.


------

What hardware and operating system are you using? What is the QAT driver version?

Model: https://www.silicom-usa.com/pr/x86-open-appliances/networking-appliances/ucpe-madrid-desktop/

CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz


Linux lucas-node2.blockbit.com 5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@lucas-node2 ~]# lsmod | grep qat
qat_api 573440 0
qat_c3xxx 20480 1
intel_qat 303104 13 qat_api,usdm_drv,qat_c3xxx
uio 20480 1 intel_qat
authenc 16384 1 intel_qat

[root@lucas-node2 ~]# adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c3xxx, inst_id: 0, node_id: 0, bsf: 0000:01:00.0, #accel: 3 #engines: 6 state: up


We used this package: https://downloadmirror.intel.com/795697/QAT.L.4.24.0-00005.tar.gz

We compiled from:

driver_install.sh from repo https://github.com/intel/QAT_Engine.git, branch master.

 

------

Can you please provide the necessary steps to replicate this issue? 

It is keep simple:

Create a VPN IPSEC Tunnel between 2 Linux with Strongswan;
Use the iperf3 to measure the throughput.

QAT without pcrypt causes panic after some seconds of traffic.

 

Are you integrating QAT with StrongSwan? 

Yes. We compile the QAT with --enable-qat-lkcf.

[root@lucas-node2 ~]# cat /proc/crypto | grep qat
driver : echainiv(pcrypt(qat_aes_cbc_hmac_sha256))
driver : pcrypt(authenc(hmac(sha512-ssse3),qat_aes_ctr))
driver : authenc(hmac(sha512-ssse3),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha384-ssse3),qat_aes_ctr))
driver : authenc(hmac(sha384-ssse3),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha256-generic),qat_aes_ctr))
driver : authenc(hmac(sha256-generic),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha1-generic),qat_aes_ctr))
driver : authenc(hmac(sha1-generic),qat_aes_ctr)
driver : pcrypt(authenc(hmac(md5-generic),qat_aes_ctr))
driver : authenc(hmac(md5-generic),qat_aes_ctr)
driver : pcrypt(qat_aes_cbc_hmac_sha512)
driver : pcrypt(authenc(hmac(sha384-ssse3),qat_aes_cbc))
driver : authenc(hmac(sha384-ssse3),qat_aes_cbc)
driver : pcrypt(qat_aes_cbc_hmac_sha256)
driver : pcrypt(authenc(hmac(sha1-generic),qat_aes_cbc))
driver : authenc(hmac(sha1-generic),qat_aes_cbc)
driver : pcrypt(authenc(hmac(md5-generic),qat_aes_cbc))
driver : authenc(hmac(md5-generic),qat_aes_cbc)
driver : rfc3686(qat_aes_ctr)
driver : qat-rsa
module : intel_qat
driver : qat_aes_gcm
module : intel_qat
driver : qat_aes_cbc_hmac_sha512
module : intel_qat
driver : qat_aes_cbc_hmac_sha256
module : intel_qat
driver : qat_aes_xts
module : intel_qat
driver : qat_aes_ctr
module : intel_qat
driver : qat_aes_cbc
module : intel_qat

0 Kudos
lpereira
New Contributor I
1,479 Views

without pcrypt

[root@lucas-node2 ~]# cat /proc/crypto | grep qat
driver : echainiv(qat_aes_cbc_hmac_sha256)
driver : rfc3686(qat_aes_ctr)
driver : qat-rsa
module : intel_qat
driver : qat_aes_gcm
module : intel_qat
driver : qat_aes_cbc_hmac_sha512
module : intel_qat
driver : qat_aes_cbc_hmac_sha256
module : intel_qat
driver : qat_aes_xts
module : intel_qat
driver : qat_aes_ctr
module : intel_qat
driver : qat_aes_cbc
module : intel_qat

0 Kudos
lpereira
New Contributor I
1,456 Views

Some informations:


[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/fw
4.19.0
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/hw
17
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/mmp
6.0.0


[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/heartbeat
0
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/heartbeat_failed
0

----

modprobe netconsole netconsole=6666@172.16.13.210/eth2,514@172.31.0.203/00:e2:69:0d:9a:88

---
[root@lucas-node2 ~]# lsmod | grep cryp
crypto_user 16384 0
dm_crypt 49152 5
crypto_simd 16384 2 serpent_sse2_x86_64,aesni_intel
cryptd 24576 2 crypto_simd,ghash_clmulni_intel
dm_mod 131072 13 dm_crypt,dm_log,dm_mirror

0 Kudos
Ronny_G_Intel
Moderator
1,431 Views

Hi lpereira,


I need some clarification: If you use QAT with pcrypt disabled at the Kernel level and create a VPN IPSEC Tunnel between 2 Linux with Strongswan using iperf3 to measure the throughput, the system crashes and causes a Kernel panic, is this correct?

The reason to disable pcryt module is documented here https://patchwork.kernel.org/project/linux-crypto/patch/20171220222825.207321-1-ebiggers3@gmail.com/ but it is not clear to me, can you please provide more details?


My understanding is that pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores when encrypting and decrypting IPsec packets. What is the reason for you to disable it? What kind of errors do you see with pcrypt module enabled?


Regards,

Ronny G


0 Kudos
lpereira
New Contributor I
1,392 Views

Hi Ronny,

 

I need some clarification: If you use QAT with pcrypt disabled at the Kernel level and create a VPN IPSEC Tunnel between 2 Linux with Strongswan using iperf3 to measure the throughput, the system crashes and causes a Kernel panic, is this correct?

Yes

 

I wasn't aware of this patch; we will check into it. However, Mr. Steffen, a maintainer of the Linux Kernel, recommended that we disable this module. After we did that, the problem was resolved.

Our issue can be tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=217654

 

 

My understanding is that pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores when encrypting and decrypting IPsec packets.

Exactly.

 

What is the reason for you to disable it?

Due to the issue mentioned here: https://bugzilla.kernel.org/show_bug.cgi?id=217654

 

 

What kind of errors do you see with pcrypt module enabled?

There are no errors; both the VPN and throughput acceleration work very well.

 

0 Kudos
Ronny_G_Intel
Moderator
1,376 Views

Hi lpereira,


Thanks for the additional information provided.

I haven't had any luck in my research regarding this issue, I cant find any previous issue involving QAT and pcrypt and not reference to kernel panic issues with having pcrypt and QAT running 

I also checked the link you provided me with https://bugzilla.kernel.org/show_bug.cgi?id=217654 but I am still not clear about the reason to have pcrypt disabled, can you please clarify?


On the other hand, how did you disable pcrypt? Is it disabled at the kernel or at the algorithm level? Can you please share how you did this? Did you use crconf?

As mentioned before, the Linux kernel encrypts and decrypt IPsec packets on a single CPU core only by default. Since 2.6.34 the pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores and you are at kernel version 5.4.113-1.


Regards,

Ronny G


0 Kudos
lpereira
New Contributor I
1,351 Views

Hi Ronny

 

I haven't had any luck in my research regarding this issue, I cant find any previous issue involving QAT and pcrypt and not reference to kernel panic issues with having pcrypt and QAT running

I also checked the link you provided me with https://bugzilla.kernel.org/show_bug.cgi?id=217654 but I am still not clear about the reason to have pcrypt disabled, can you please clarify?

We followed Steffen Klassert's recommendation; after disabling pcrypt, the problem did not occur again.

De: Steffen Klassert <steffen.klassert@secunet.com>
Enviado: segunda-feira, 18 de dezembro de 2023 08:33
Para: Lucas Vicente Pereira <lpereira@blockbit.com>

Assunto: Re: Linux Kernel Consulting Request - IPSec VPN

Hi Lucas,

I've seen you use pcrypt. Please try without it. The pcrypt
parallelization backend (padata) was reused by some other
subsystem and changed to their needs, so I guess it is a
bug there.

Steffen

---


On the other hand, how did you disable pcrypt? Is it disabled at the kernel or at the algorithm level? Can you please share how you did this? Did you use crconf?

We disabled the module in the Kernel:

`cryptsetup luksOpen /dev/mmcblk0p2 onboot`

mount /dev/mapper/onboot /boot

vi /boot/grub2/grub.cfg

add: pcrypt.blacklist=yes rdblacklist=pcrypt module_blacklist=pcrypt


vi /etc/modprobe.d/blacklist.conf

add: blacklist pcrypt

depmod

dracut -f /boot/initramfs-4.19.12-1.el7.elrepo.x86_64.img

reboot
--


As mentioned before, the Linux kernel encrypts and decrypt IPsec packets on a single CPU core only by default. Since 2.6.34 the pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores and you are at kernel version 5.4.113-1.

Yes, however, in environments with hundreds of tunnels and heavy traffic, we encountered the bug, which was resolved by disabling it. We conducted throughput tests and, surprisingly, the loss without pcrypt was only 5%.

 

But now, we would like to use the QAT module, but unfortunately, without pcrypt, it causes this kernel panic...

0 Kudos
Ronny_G_Intel
Moderator
1,184 Views

Hi lpereira,


I am still investigating this issue, thank you for the details provided.

I will get back to you as soon as possible.


Regards,

Ronny G


0 Kudos
lpereira
New Contributor I
1,178 Views

Thank you, Ronny.

I will keep the laboratory set up, if you want more information or access to the environment, I am at your disposal.

 

Best regards. 

0 Kudos
Ronny_G_Intel
Moderator
1,049 Views

Hi lpereira,


I am currently continuing my investigation into this matter and I am in consultation with the QAT engineering team.

I will provide you with an update at the earliest opportunity.


Thanks,

Ronny G




0 Kudos
Ronny_G_Intel
Moderator
1,010 Views

Hi lpereira,

 

Can you please share the icp_dump output? 

We would like to check “cat /proc/crypto”, and “cat /proc/cmdline”.

I’m assuming that QAT_engine is not relevant, please confirm.

Do you see the same issue if you set CONFIG_CRYPTO_PCRYPT=n in the kernel config and rebuild the kernel? There could be a tricky bug in the pcrypt code, and a workaround might be a particular disable approach. 

On the other hand, pcrypt is not typically enabled by default in most standard kernel configurations because it is a specialized feature that not all users require. Instead, it is often provided as a module that can be loaded if needed.

The kernel version you've mentioned, 5.4.113-1.el7.elrepo.x86_64, suggests that you are using a kernel provided by the ELRepo repository for an Enterprise Linux 7 (RHEL 7 or CentOS 7) system and my understanding is that in kernel 5.4.113, pcrypt is not enabled by default. Please confirm.

If my assumptions are correct, what would be the distinction between not enabling pcrypt if it is not enabled by default versus disabling pcrypt as you are doing it?

To check if pcrypt is available on your system # modinfo pcrypt and of course, I dont see pcrypt enabled when you run # lsmod | grep pcrypt

If you disable pcrypt as you are doing and also disable QAT+LKCF, do you still see the kernel panic and watchdog bug/soft lockup?

 

Regards,

Ronny G

 

0 Kudos
lpereira
New Contributor I
993 Views

Hi Ronny,

Awnsers in line

 

Can you please share the icp_dump output?

R:

How can I get this dump?

 

We would like to check “cat /proc/crypto”, and “cat /proc/cmdline”.
R:

/proc/crypto in attach.

 

without pcrypt

[root@lucas-node2 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.113-1.el7.elrepo.x86_64 root=UUID=3697e30f-dec1-4a52-ad7a-2db6f8f06610 ro vconsole.font=latarcyrheb-sun16 vconsole.keymap=us rd.luks.uuid=luks-b221099d-60bd-44bf-8f98-c2147a4b22cb rd.luks.key=/etc/._key rd.luks.options=allow-discards splash=silent quiet pcrypt.blacklist=yes rdblacklist=pcrypt module_blacklist=pcrypt elevator=noop rd.plymouth=0 plymouth.enable=0 console=tty0 console=ttyS0,115200

with pcrypt

[root@lucas-node2 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.113-1.el7.elrepo.x86_64 root=UUID=3697e30f-dec1-4a52-ad7a-2db6f8f06610 ro vconsole.font=latarcyrheb-sun16 vconsole.keymap=us rd.luks.uuid=luks-b221099d-60bd-44bf-8f98-c2147a4b22cb rd.luks.key=/etc/._key rd.luks.options=allow-discards splash=silent quiet elevator=noop rd.plymouth=0 plymouth.enable=0 console=tty0 console=ttyS0,115200

I’m assuming that QAT_engine is not relevant, please confirm.

 

Do you see the same issue if you set CONFIG_CRYPTO_PCRYPT=n in the kernel config and rebuild the kernel? There could be a tricky bug in the pcrypt code, and a workaround might be a particular disable approach.

R:
I understand, it will be more complicated to carry out this test. Do you believe that the way of deactivation specifically influences the QAT?

config_default from ELRepo

CONFIG_CRYPTO_PCRYPT=m

 

On the other hand, pcrypt is not typically enabled by default in most standard kernel configurations because it is a specialized feature that not all users require. Instead, it is often provided as a module that can be loaded if needed.

The kernel version you've mentioned, 5.4.113-1.el7.elrepo.x86_64, suggests that you are using a kernel provided by the ELRepo repository for an Enterprise Linux 7 (RHEL 7 or CentOS 7) system and my understanding is that in kernel 5.4.113, pcrypt is not enabled by default. Please confirm.
R:
Correct, we used to ELRepo and pcrypt is enabled by default, we tested new versions (6.X) and pcrypt is enabled by default.


If my assumptions are correct, what would be the distinction between not enabling pcrypt if it is not enabled by default versus disabling pcrypt as you are doing it?

 

Due to the VPN crash issue that I mentioned previously, we cannot use pcrypt, the way we were recommended was to just disable the module and it worked, resolving the issue.

But we can try, disabling it in the kernel configuration and recompiling.

 

To check if pcrypt is available on your system # modinfo pcrypt and of course, I dont see pcrypt enabled when you run # lsmod | grep pcrypt
R:
without pcrypt

[root@lucas-node2 ~]# modinfo pcrypt
filename: /lib/modules/5.4.113-1.el7.elrepo.x86_64/kernel/crypto/pcrypt.ko
alias: crypto-pcrypt
alias: pcrypt
description: Parallel crypto wrapper
author: Steffen Klassert <steffen.klassert@secunet.com>
license: GPL
srcversion: D67E05C972393B4D09689F2
depends:
retpoline: Y
intree: Y
name: pcrypt
vermagic: 5.4.113-1.el7.elrepo.x86_64 SMP mod_unload modversions

[root@lucas-node2 ~]# lsmod | grep pcrypt

with pcrypt

[root@ngfw-qa admin]# modinfo pcrypt
filename: /lib/modules/5.4.113-1.el7.elrepo.x86_64/kernel/crypto/pcrypt.ko
alias: crypto-pcrypt
alias: pcrypt
description: Parallel crypto wrapper
author: Steffen Klassert <steffen.klassert@secunet.com>
license: GPL
srcversion: D67E05C972393B4D09689F2
depends:
retpoline: Y
intree: Y
name: pcrypt
vermagic: 5.4.113-1.el7.elrepo.x86_64 SMP mod_unload modversions

[root@ngfw-qa admin]# lsmod | grep pcrypt
pcrypt 16384 8

 

If you disable pcrypt as you are doing and also disable QAT+LKCF, do you still see the kernel panic and watchdog bug/soft lockup?
R:
No, without QAT+LKCF, there no kernel panic and watchdog bug/soft lockup. We make these tests everytime to mesure our datasheet.


Regards,

Ronny G

0 Kudos
Ronny_G_Intel
Moderator
964 Views

Hi lpereira,


Thanks for the additional information provided.

To execute an icp_dump, please run the script located here: $ICP_ROOT/quickassist/utilities/release-files/debug_tool/icp_dump.sh

This will generate a tar file that will provide us with your full system setup, including your configuration files.


Thanks,

Ronny G


0 Kudos
lpereira
New Contributor I
919 Views

Dear Ronny,

I am conducting new tests on the same machines, now without using QAT, both with and without pcrypt, aiming to confirm if the issue occurs exclusively under the condition of QAT without pcrypt.

I will return with an update in a few days.

Sincerely,

0 Kudos
Ronny_G_Intel
Moderator
895 Views

Hi lpereira,


Thank you for letting me know that you are going to be conducting additional testing and verification.

In addition to that we have done also some additional testing, please see below:


AlmaLinux 9.3, which is a Red Hat clone:


[root@JA-NC-Alma93 ~]# lsmod | grep pcrypt

[root@JA-NC-Alma93 ~]# modinfo pcrypt

filename:    /lib/modules/5.14.0-362.8.1.el9_3.x86_64/kernel/crypto/pcrypt.ko                                                                               .xz

alias:     crypto-pcrypt

alias:     pcrypt

description:  Parallel crypto wrapper

author:     Steffen Klassert steffen.klassert@secunet.com

license:    GPL

rhelversion:  9.3

srcversion:   0A189F0F00A2CC8B8914FCA

depends:

retpoline:   Y

intree:     Y

name:      pcrypt

vermagic:    5.14.0-362.8.1.el9_3.x86_64 SMP preempt mod_unload modversions

sig_id:     PKCS#7

signer:     AlmaLinux kernel signing key

sig_key:    34:2E:71:51:24:9F:6A:BA:45:DD:5A:37:FE:4C:EF:C2:69:14:D3:C4

sig_hashalgo:  sha256

signature:   78:98:B5:17:28:07:83:34:C9:35:04:0D:EC:DA:C2:38:0D:97:40:62:

        92:9D:28:A0:17:46:A3:E8:B4:D9:E6:18:6A:E1:2C:25:85:B8:98:3E:

        8C:66:64:83:6B:FD:15:7E:16:6B:08:4B:B0:5C:50:B3:18:F9:F3:6C:

        47:6E:E1:78:9C:08:65:A3:7C:B4:43:27:8B:A4:45:54:2C:05:03:11:

        1B:C0:5F:E3:74:55:B9:94:D0:0F:5A:CD:F0:FE:B2:C9:76:A6:64:0D:

        7E:93:05:C1:24:01:E1:27:05:70:5C:72:6B:1C:91:96:98:D9:79:88:

        FE:49:79:32:17:71:A2:0D:72:E7:FF:12:73:49:87:FD:BE:C7:05:59:

        2F:D9:3E:55:CE:02:86:41:10:A3:BF:27:9B:C9:51:21:6F:06:98:9B:

        EC:D0:7C:83:0D:85:2E:1B:8F:18:AF:71:16:4C:EA:7C:72:1B:60:E3:

        C4:D0:E9:9F:93:95:06:33:51:81:CE:A5:9B:30:2C:F0:46:81:D1:68:

        EE:B6:B2:AA:35:72:C5:92:70:A2:C4:CE:83:35:72:ED:7F:54:40:94:

        BD:06:02:DE:4D:CE:B7:74:46:60:23:D4:6C:6A:F7:1C:7F:C8:20:45:

        2D:06:23:7A:DF:E4:20:C7:B2:A5:22:24:1F:3E:EE:90:FF:83:13:E5:

        9F:7B:FB:E1:A2:BA:3A:13:42:21:47:22:16:24:5E:D9:D9:FF:55:A7:

        4B:64:4B:01:F3:27:E1:09:B3:7D:3F:C9:D2:D5:2A:79:B0:91:98:58:

        5E:EC:20:D1:0D:A2:6E:E0:A9:33:68:97:55:EF:78:1C:6A:BD:DD:CB:

        01:B2:96:51:58:E2:49:CC:E3:2D:71:64:26:17:45:81:ED:13:AC:D5:

        22:7B:67:C4:7C:49:2F:5C:CA:74:56:56:6A:C4:A3:69:2E:CB:16:1B:

        6E:CA:E0:08:6D:54:C7:AF:97:B4:1A:FC:B5:03:78:44:9F:31:24:FE:

        1D:A8:8E:87

[root@JA-NC-Alma93 ~]# uname -a

Linux JA-NC-Alma93 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux


Note that I don’t have a pcrypt kernel module installed (i.e. my lsmod + grep for pcrypt returned empty), though I haven’t blacklisted it. I have also done some research and my understanding is that pcrypt is not enabled by default.

Did you enable it for any particular reason and then you blacklisted it? 

You could also test this by installing CentOS 7 or another Red Hat clone and then checking for pcrypt with lsmod.


Thanks,

Ronny G


0 Kudos
lpereira
New Contributor I
802 Views

Thanks you Mr. Ronny,

 

I don't know the details, but in kernel 5.4.113 and higher that we tested from REPO EL for CentOS7, everyone had pcrypt activated, anyway.

I carried out the tests using the same 2 network appliances, with pcrypt and without pcrypt and the problem did not appear.

I will activate QAT, without pcrypt and repeat the tests, I will get back to you in about 48 hours.

Thank you very much

 

Best regards

Lucas Pereira

0 Kudos
Ronny_G_Intel
Moderator
795 Views

Hi Lucas,


Thank you for the new details. Just a reminder, I have not conducted tests on ELRepo Project* images. My earlier comment about pcrypt not being enabled by default was in reference to the standard Operating System images that are directly obtained from the official websites of CentOS 7* or Red Hat*.


Regards,

Ronny G


0 Kudos
Ronny_G_Intel
Moderator
737 Views

Hi Lucas,


I am reaching out to see if there is any update on this issue.


Thanks,

Ronny G


0 Kudos
lpereira
New Contributor I
693 Views

Dear Ronny,

I performed the following tests:

FW 1 <--> FW 2 (with pcrypt & without QAT): 48hs of traffic: OK
FW 1 <--> FW 2 (without pcrypt & without QAT): 48hs of traffic: OK
FW 1 <--> FW 2 (with pcrypt & with QAT): 48hs of traffic: OK
FW 1 <--> FW 2 (without pcrypt & with QAT): 48hs traffic: 30 seconds: Panic.

Attached is the panic.

Thank you very much
Lucas Pereira

0 Kudos
Reply