Intel® QuickAssist Technology (Intel® QAT)
For questions and discussions related to Intel® QuickAssist Technology (Intel® QAT).
35 Discussions

Kernel Panic when we disable pcryt module in our kernel.

lpereira
Novice
445 Views
After some tests, we have a Kernel Panic, when we disable pcryt module in our kernel.
 
We must not use pcrypt module because this issue: https://bugzilla.kernel.org/show_bug.cgi?id=217654
 
 
# lsmod | grep pcrypt
pcrypt                 16384  12
# lsmod | grep qat
qat_c3xxx              20480  1
intel_qat             303104  18 usdm_drv,qat_c3xxx
uio                    20480  1 intel_qat
authenc                16384  1 intel_qat
 
Linux  5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
 
Please, can you help us?
 
Details of kernel panic:
 
 

Please, can you help us?

Details of kernel panic:


[ 796.119373] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1H:29293]
[ 796.127078] Modules linked in: echainiv esp4 xt_addrtype ip_set_hash_net xt_NFLOG xt_devgroup xt_hashlimit xt_CT xt_REDIRECT xt_multiport xfrm_interface twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common ip6table_nat ip6table_mangle ip6table_raw ip6table_filter ip6_tables xt_MASQUERADE xt_conntrack xt_set ip_set_hash_ip ip_set serpent_sse2_x86_64 serpent_generic cast5_generic cast_common xt_connmark xt_mark xt_connlabel iptable_nat iptable_mangle iptable_raw des_generic libdes crypto_user camellia_generic camellia_x86_64 xcbc md4 iptable_filter nf_nat_ftp nf_conntrack_ftp nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp nf_nat ip_gre ip_tunnel gre tun macvlan qat_api(O) usdm_drv(O) nfnetlink_log nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink sunrpc sha512_ssse3 sha512_generic qat_c3xxx(O) pnd2_edac intel_qat(O) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
[ 796.127123] irqbypass iTCO_wdt rapl iTCO_vendor_support intel_cstate uio pcspkr i2c_i801 pinctrl_denverton authenc pinctrl_intel tpm_infineon i2c_ismt acpi_cpufreq tcp_htcp ip_tables ext4 mbcache jbd2 dm_crypt blowfish_generic blowfish_x86_64 blowfish_common mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd sdhci_pci cqhci sdhci ixgbe(O) mmc_core igb(O) dca ptp pps_core ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod fuse
[ 796.127154] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 796.127155] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 796.127174] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 796.127180] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 796.127183] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01
[ 796.127185] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 796.127188] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 796.127189] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 796.127191] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 796.127193] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 796.127194] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127197] FS: 0000000000000000(0000) GS:ffff915837b00000(0000) knlGS:0000000000000000
[ 796.127199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 796.127200] CR2: 00007f1958e5d0a0 CR3: 000000019c20a000 CR4: 00000000003406e0
[ 796.127202] Call Trace:
[ 796.127204] <IRQ>
[ 796.127208] _raw_spin_lock+0x1e/0x30
[ 796.127211] xfrm_input+0x1b0/0xa00
[ 796.127215] xfrm4_rcv+0x3b/0x40
[ 796.127218] xfrm4_esp_rcv+0x39/0x50
[ 796.127222] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 796.127226] ip_local_deliver_finish+0x48/0x50
[ 796.127229] ip_local_deliver+0xe5/0xf0
[ 796.127233] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 796.127236] ip_sublist_rcv_finish+0x5e/0x70
[ 796.127240] ip_sublist_rcv+0x219/0x2b0
[ 796.127244] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 796.127248] ip_list_rcv+0x134/0x160
[ 796.127252] __netif_receive_skb_list_core+0x28d/0x2b0
[ 796.127256] netif_receive_skb_list_internal+0x1d5/0x300
[ 796.127271] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 796.127275] gro_normal_list.part.0+0x1e/0x40
[ 796.127278] napi_complete_done+0x91/0x140
[ 796.127293] ixgbe_poll+0x413/0x650 [ixgbe]
[ 796.127297] net_rx_action+0x147/0x3b0
[ 796.127300] __do_softirq+0xe1/0x2d6
[ 796.127304] irq_exit+0xe5/0xf0
[ 796.127308] do_IRQ+0x5a/0xf0
[ 796.127311] common_interrupt+0xf/0xf
[ 796.127313] </IRQ>
[ 796.127317] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 796.127319] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 796.127321] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 796.127324] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 796.127325] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 796.127327] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 796.127329] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 796.127331] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 796.127336] ? xfrm_replay_advance+0x52/0xc0
[ 796.127339] xfrm_input+0x559/0xa00
[ 796.127342] xfrm_input_resume+0x15/0x20
[ 796.127346] esp_input_done+0x21/0x30 [esp4]
[ 796.127366] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 796.127386] qat_alg_callback+0x22/0x30 [intel_qat]
[ 796.127403] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 796.127421] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 796.127424] process_one_work+0x1b5/0x370
[ 796.127428] worker_thread+0x50/0x3d0
[ 796.127432] kthread+0x106/0x140
[ 796.127434] ? process_one_work+0x370/0x370
[ 796.127437] ? kthread_park+0x90/0x90
[ 796.127441] ret_from_fork+0x35/0x40
[ 802.714196] rcu: INFO: rcu_sched self-detected stall on CPU
[ 802.714203] rcu: 2-....: (1 GPs behind) idle=be2/1/0x4000000000000004 softirq=144695/144696 fqs=14965
[ 802.714206] (t=60000 jiffies g=359009 q=147256)
[ 802.714209] NMI backtrace for cpu 2
[ 802.714213] CPU: 2 PID: 29293 Comm: kworker/2:1H Tainted: G O L 5.4.113-1.el7.elrepo.x86_64 #1
[ 802.714215] Hardware name: Silicom 90500-0151-G01/90500-0151-G01, BIOS MADRID-01.00.18.06 06/01/2020
[ 802.714235] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 802.714238] Call Trace:
[ 802.714240] <IRQ>
[ 802.714245] dump_stack+0x6d/0x8b
[ 802.714249] ? lapic_can_unplug_cpu+0x80/0x80
[ 802.714253] nmi_cpu_backtrace.cold+0x14/0x53
[ 802.714258] nmi_trigger_cpumask_backtrace+0xd9/0xe0
[ 802.714262] arch_trigger_cpumask_backtrace+0x19/0x20
[ 802.714266] rcu_dump_cpu_stacks+0x9c/0xce
[ 802.714270] rcu_sched_clock_irq.cold+0x1dc/0x3c4
[ 802.714276] update_process_times+0x2c/0x60
[ 802.714281] tick_sched_handle+0x29/0x60
[ 802.714284] tick_sched_timer+0x3d/0x80
[ 802.714287] __hrtimer_run_queues+0xf7/0x270
[ 802.714291] ? tick_sched_do_timer+0x70/0x70
[ 802.714294] hrtimer_interrupt+0x109/0x220
[ 802.714298] smp_apic_timer_interrupt+0x71/0x140
[ 802.714303] apic_timer_interrupt+0xf/0x20
[ 802.714308] RIP: 0010:native_queued_spin_lock_slowpath+0x60/0x1d0
[ 802.714311] Code: 6e f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 48 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01

[ 802.714313] RSP: 0018:ffffbaa3c00eca90 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 802.714316] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000007
[ 802.714317] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9157d9ae02fc
[ 802.714319] RBP: ffffbaa3c00eca90 R08: 0000000000000032 R09: ffff9157d9ae02c0
[ 802.714321] R10: 0000000000000002 R11: 0000000000000032 R12: 0000000000000002
[ 802.714323] R13: ffff9157e3443400 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714327] ? apic_timer_interrupt+0xa/0x20
[ 802.714332] _raw_spin_lock+0x1e/0x30
[ 802.714335] xfrm_input+0x1b0/0xa00
[ 802.714340] xfrm4_rcv+0x3b/0x40
[ 802.714344] xfrm4_esp_rcv+0x39/0x50
[ 802.714348] ip_protocol_deliver_rcu+0x1a6/0x1b0
[ 802.714352] ip_local_deliver_finish+0x48/0x50
[ 802.714355] ip_local_deliver+0xe5/0xf0
[ 802.714359] ? ip_protocol_deliver_rcu+0x1b0/0x1b0
[ 802.714363] ip_sublist_rcv_finish+0x5e/0x70
[ 802.714367] ip_sublist_rcv+0x219/0x2b0
[ 802.714372] ? ip_rcv_finish_core.isra.0+0x3c0/0x3c0
[ 802.714376] ip_list_rcv+0x134/0x160
[ 802.714380] __netif_receive_skb_list_core+0x28d/0x2b0
[ 802.714384] netif_receive_skb_list_internal+0x1d5/0x300
[ 802.714400] ? ixgbe_clean_rx_irq+0x2cd/0xbb0 [ixgbe]
[ 802.714405] gro_normal_list.part.0+0x1e/0x40
[ 802.714408] napi_complete_done+0x91/0x140
[ 802.714424] ixgbe_poll+0x413/0x650 [ixgbe]
[ 802.714428] net_rx_action+0x147/0x3b0
[ 802.714432] __do_softirq+0xe1/0x2d6
[ 802.714436] irq_exit+0xe5/0xf0
[ 802.714441] do_IRQ+0x5a/0xf0
[ 802.714444] common_interrupt+0xf/0xf
[ 802.714446] </IRQ>
[ 802.714450] RIP: 0010:netlink_has_listeners+0xc/0x60
[ 802.714453] Code: 41 bc ea ff ff ff e9 b0 fe ff ff 31 f6 eb 9f 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 8b 87 fc 02 00 00 <f7> d0 48 89 e5 83 e0 01 75 3d 0f b6 97 11 02 00 00 48 8d 0c 52 48
[ 802.714455] RSP: 0018:ffffbaa3c238fcf8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffd3
[ 802.714457] RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000001
[ 802.714459] RDX: 000000000001ffff RSI: 0000000000000005 RDI: ffff9158313c3800
[ 802.714461] RBP: ffffbaa3c238fd10 R08: 0000000000000000 R09: ffff9157d9ae02c0
[ 802.714463] R10: 0000000000000000 R11: 0000000000007e04 R12: ffff9157d9ae02c0
[ 802.714465] R13: ffff9157e3443b00 R14: 0000000000000000 R15: ffff9157d9ae02fc
[ 802.714470] ? xfrm_replay_advance+0x52/0xc0
[ 802.714473] xfrm_input+0x559/0xa00
[ 802.714477] xfrm_input_resume+0x15/0x20
[ 802.714482] esp_input_done+0x21/0x30 [esp4]
[ 802.714503] qat_aead_alg_callback+0x9b/0xb0 [intel_qat]
[ 802.714524] qat_alg_callback+0x22/0x30 [intel_qat]
[ 802.714543] adf_handle_response+0x4b/0xd0 [intel_qat]
[ 802.714563] adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 802.714567] process_one_work+0x1b5/0x370
[ 802.714570] worker_thread+0x50/0x3d0
[ 802.714574] kthread+0x106/0x140
[ 802.714577] ? process_one_work+0x370/0x370
[ 802.714580] ? kthread_park+0x90/0x90
[ 802.714584] ret_from_fork+0x35/0x40
[ 803.916248] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 2-... } 60364 jiffies s: 1097 root: 0x4/.
[ 803.916261] rcu: blocking rcu_node structures:
[ 803.916265] Task dump for CPU 2:
[ 803.916269] kworker/2:1H R running task 0 29293 2 0x80004088
[ 803.916315] Workqueue: adf_pf_resp_wq_0 adf_response_handler_wq [intel_qat]
[ 803.916319] Call Trace:
[ 803.916354] ? adf_handle_response+0x4b/0xd0 [intel_qat]
[ 803.916383] ? adf_response_handler_wq+0x84/0xe0 [intel_qat]
[ 803.916392] ? process_one_work+0x1b5/0x370
[ 803.916397] ? worker_thread+0x50/0x3d0
[ 803.916404] ? kthread+0x106/0x140
[ 803.916408] ? process_one_work+0x370/0x370
[ 803.916412] ? kthread_park+0x90/0x90
[ 803.916420] ? ret_from_fork+0x35/0x40

0 Kudos
10 Replies
Ronny_G_Intel
Moderator
396 Views

Hi lpereira,

 

Thanks for reaching out to Intel Communities.

I see that you are reporting a Kernel Panic when you disable pcryt module in your kernel.

Can you please provide more details so that can understand and troubleshoot this issue? 

When is the kernel panic happening?

What hardware and operating system are you using? What is the QAT driver version?

Can you please provide the necessary steps to replicate this issue? 

Are you integrating QAT with StrongSwan? 

 

Thanks,

Ronny G

0 Kudos
lpereira
Novice
364 Views

Dear Ronny,

 

Reply in line.

 

Can you please provide more details so that can understand and troubleshoot this issue? 

I have 2 network appliances:

Model: https://www.silicom-usa.com/pr/x86-open-appliances/networking-appliances/ucpe-madrid-desktop/

CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz


Linux lucas-node2.blockbit.com 5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@lucas-node2 ~]# lsmod | grep qat
qat_api 573440 0
qat_c3xxx 20480 1
intel_qat 303104 13 qat_api,usdm_drv,qat_c3xxx
uio 20480 1 intel_qat
authenc 16384 1 intel_qat


[root@lucas-node2 ~]# lsmod | grep pcry
pcrypt 16384 6


[root@lucas-node2 crypto]# systemctl status qat
● qat.service - QAT service
Loaded: loaded (/usr/lib/systemd/system/qat.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2024-03-22 11:57:30 -03; 32s ago
Process: 31581 ExecStop=/etc/init.d/qat_service shutdown (code=exited, status=0/SUCCESS)
Process: 31639 ExecStart=/etc/init.d/qat_service start (code=exited, status=0/SUCCESS)
Main PID: 31639 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/qat.service

Mar 22 11:57:29 lucas-node2.blockbit.com systemd[1]: Starting QAT service...
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Restarting all devices.
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Processing /etc/c3xxx_dev0.conf
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: Checking status of all devices.
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: There is 1 QAT acceleration device(s) in the system:
Mar 22 11:57:30 lucas-node2.blockbit.com qat_service[31639]: qat_dev0 - type: c3xxx, inst_id: 0, node_id: 0, bsf: 0000:01:00.0, #accel: 3 #engines: 6 state: up
Mar 22 11:57:30 lucas-node2.blockbit.com systemd[1]: Started QAT service.


[root@lucas-node2 ~]# strongswan version
Linux strongSwan U5.9.11/K5.4.113-1.el7.elrepo.x86_64
University of Applied Sciences Rapperswil, Switzerland


Security Associations (3 up, 0 connecting):
tun3[9]: ESTABLISHED 5 hours ago, 100.100.100.2[100.100.100.2]...100.100.100.1[100.100.100.1]
tun3[9]: IKEv2 SPIs: 0d4e10d9168ca799_i bd6272d6659970f5_r*, rekeying in 18 hours
tun3[9]: IKE proposal: AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024
tun3{21}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: cb52ee7d_i c3d03cdd_o
tun3{21}: AES_CBC_128/HMAC_SHA2_256_128/MODP_768, 0 bytes_i, 0 bytes_o, rekeying in 2 hours
tun3{21}: 172.25.0.0/24 === 172.46.0.0/24
tun2[7]: ESTABLISHED 5 hours ago, 200.200.200.2[200.200.200.2]...200.200.200.1[200.200.200.1]
tun2[7]: IKEv2 SPIs: 55ab6381d87ac2c3_i 09f8556a5d6eec63_r*, rekeying in 18 hours
tun2[7]: IKE proposal: AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024
tun2{19}: INSTALLED, TUNNEL, reqid 2, ESP SPIs: c75a891a_i ce7d4cf0_o
tun2{19}: AES_CBC_128/HMAC_SHA2_256_128/MODP_768, 0 bytes_i, 0 bytes_o, rekeying in 2 hours
tun2{19}: 172.35.0.0/24 === 172.36.0.0/24


I have 2 IPSEC TUNNELs.

IP address

LAN <172.36.0.0/24> -- [Applaince 1] -- <200.200.200.1> -- <200.200.200.2> -- [Applaince 2] -- LAN <172.35.0.0/24>
LAN <172.46.0.0/24> -- [Applaince 1] -- <100.100.100.1> -- <100.100.100.2> -- [Applaince 2] -- LAN <172.25.0.0/24>

------

When is the kernel panic happening?

 

Node 1 we have the iperf3 Server:

iperf3 -s -p 5201 -D >/dev/null 2>&1
iperf3 -s -p 5202 -D >/dev/null 2>&1

Node 2:

iperf3 -c 172.46.0.1 -p 5202 -t 120 &
iperf3 -c 172.36.0.1 -p 5203 -t 120 &

PS: we are using the iperf3 inside 2 network applainces.

[root@lucas-node2 ~]# sh iperf_client.sh
[root@lucas-node2 ~]# Connecting to host 172.46.0.1, port 5202
Connecting to host 172.36.0.1, port 5203
[ 4] local 172.25.0.3 port 59186 connected to 172.46.0.1 port 5202
[ 4] local 172.35.0.1 port 54274 connected to 172.36.0.1 port 5203
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 92.3 MBytes 772 Mbits/sec 254 321 KBytes
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.01 sec 76.1 MBytes 630 Mbits/sec 172 387 KBytes
[ 4] 1.00-2.00 sec 90.0 MBytes 757 Mbits/sec 0 433 KBytes
...
[ 4] 118.00-119.00 sec 80.0 MBytes 670 Mbits/sec 83 441 KBytes
[ 4] 119.00-120.00 sec 88.8 MBytes 744 Mbits/sec 0 517 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-120.00 sec 10.2 GBytes 728 Mbits/sec 4087 sender
[ 4] 0.00-120.00 sec 10.2 GBytes 728 Mbits/sec receiver

iperf Done.
[ 4] 119.00-120.01 sec 70.0 MBytes 583 Mbits/sec 0 587 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-120.01 sec 9.23 GBytes 661 Mbits/sec 4296 sender
[ 4] 0.00-120.01 sec 9.23 GBytes 661 Mbits/sec receiver

but if we are QAT without pcrypt causes panic after some seconds of traffic.

After start lot of traffic, I am using the iperf3 to send packets overlay IPSEC tunnels.


------

What hardware and operating system are you using? What is the QAT driver version?

Model: https://www.silicom-usa.com/pr/x86-open-appliances/networking-appliances/ucpe-madrid-desktop/

CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz


Linux lucas-node2.blockbit.com 5.4.113-1.el7.elrepo.x86_64 #1 SMP Fri Apr 16 09:41:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@lucas-node2 ~]# lsmod | grep qat
qat_api 573440 0
qat_c3xxx 20480 1
intel_qat 303104 13 qat_api,usdm_drv,qat_c3xxx
uio 20480 1 intel_qat
authenc 16384 1 intel_qat

[root@lucas-node2 ~]# adf_ctl status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
qat_dev0 - type: c3xxx, inst_id: 0, node_id: 0, bsf: 0000:01:00.0, #accel: 3 #engines: 6 state: up


We used this package: https://downloadmirror.intel.com/795697/QAT.L.4.24.0-00005.tar.gz

We compiled from:

driver_install.sh from repo https://github.com/intel/QAT_Engine.git, branch master.

 

------

Can you please provide the necessary steps to replicate this issue? 

It is keep simple:

Create a VPN IPSEC Tunnel between 2 Linux with Strongswan;
Use the iperf3 to measure the throughput.

QAT without pcrypt causes panic after some seconds of traffic.

 

Are you integrating QAT with StrongSwan? 

Yes. We compile the QAT with --enable-qat-lkcf.

[root@lucas-node2 ~]# cat /proc/crypto | grep qat
driver : echainiv(pcrypt(qat_aes_cbc_hmac_sha256))
driver : pcrypt(authenc(hmac(sha512-ssse3),qat_aes_ctr))
driver : authenc(hmac(sha512-ssse3),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha384-ssse3),qat_aes_ctr))
driver : authenc(hmac(sha384-ssse3),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha256-generic),qat_aes_ctr))
driver : authenc(hmac(sha256-generic),qat_aes_ctr)
driver : pcrypt(authenc(hmac(sha1-generic),qat_aes_ctr))
driver : authenc(hmac(sha1-generic),qat_aes_ctr)
driver : pcrypt(authenc(hmac(md5-generic),qat_aes_ctr))
driver : authenc(hmac(md5-generic),qat_aes_ctr)
driver : pcrypt(qat_aes_cbc_hmac_sha512)
driver : pcrypt(authenc(hmac(sha384-ssse3),qat_aes_cbc))
driver : authenc(hmac(sha384-ssse3),qat_aes_cbc)
driver : pcrypt(qat_aes_cbc_hmac_sha256)
driver : pcrypt(authenc(hmac(sha1-generic),qat_aes_cbc))
driver : authenc(hmac(sha1-generic),qat_aes_cbc)
driver : pcrypt(authenc(hmac(md5-generic),qat_aes_cbc))
driver : authenc(hmac(md5-generic),qat_aes_cbc)
driver : rfc3686(qat_aes_ctr)
driver : qat-rsa
module : intel_qat
driver : qat_aes_gcm
module : intel_qat
driver : qat_aes_cbc_hmac_sha512
module : intel_qat
driver : qat_aes_cbc_hmac_sha256
module : intel_qat
driver : qat_aes_xts
module : intel_qat
driver : qat_aes_ctr
module : intel_qat
driver : qat_aes_cbc
module : intel_qat

0 Kudos
lpereira
Novice
318 Views

without pcrypt

[root@lucas-node2 ~]# cat /proc/crypto | grep qat
driver : echainiv(qat_aes_cbc_hmac_sha256)
driver : rfc3686(qat_aes_ctr)
driver : qat-rsa
module : intel_qat
driver : qat_aes_gcm
module : intel_qat
driver : qat_aes_cbc_hmac_sha512
module : intel_qat
driver : qat_aes_cbc_hmac_sha256
module : intel_qat
driver : qat_aes_xts
module : intel_qat
driver : qat_aes_ctr
module : intel_qat
driver : qat_aes_cbc
module : intel_qat

0 Kudos
lpereira
Novice
295 Views

Some informations:


[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/fw
4.19.0
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/hw
17
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/version/mmp
6.0.0


[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/heartbeat
0
[root@lucas-node2 ~]# cat /sys/kernel/debug/qat_c3xxx_0000\:01\:00.0/heartbeat_failed
0

----

modprobe netconsole netconsole=6666@172.16.13.210/eth2,514@172.31.0.203/00:e2:69:0d:9a:88

---
[root@lucas-node2 ~]# lsmod | grep cryp
crypto_user 16384 0
dm_crypt 49152 5
crypto_simd 16384 2 serpent_sse2_x86_64,aesni_intel
cryptd 24576 2 crypto_simd,ghash_clmulni_intel
dm_mod 131072 13 dm_crypt,dm_log,dm_mirror

0 Kudos
Ronny_G_Intel
Moderator
270 Views

Hi lpereira,


I need some clarification: If you use QAT with pcrypt disabled at the Kernel level and create a VPN IPSEC Tunnel between 2 Linux with Strongswan using iperf3 to measure the throughput, the system crashes and causes a Kernel panic, is this correct?

The reason to disable pcryt module is documented here https://patchwork.kernel.org/project/linux-crypto/patch/20171220222825.207321-1-ebiggers3@gmail.com/ but it is not clear to me, can you please provide more details?


My understanding is that pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores when encrypting and decrypting IPsec packets. What is the reason for you to disable it? What kind of errors do you see with pcrypt module enabled?


Regards,

Ronny G


0 Kudos
lpereira
Novice
231 Views

Hi Ronny,

 

I need some clarification: If you use QAT with pcrypt disabled at the Kernel level and create a VPN IPSEC Tunnel between 2 Linux with Strongswan using iperf3 to measure the throughput, the system crashes and causes a Kernel panic, is this correct?

Yes

 

I wasn't aware of this patch; we will check into it. However, Mr. Steffen, a maintainer of the Linux Kernel, recommended that we disable this module. After we did that, the problem was resolved.

Our issue can be tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=217654

 

 

My understanding is that pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores when encrypting and decrypting IPsec packets.

Exactly.

 

What is the reason for you to disable it?

Due to the issue mentioned here: https://bugzilla.kernel.org/show_bug.cgi?id=217654

 

 

What kind of errors do you see with pcrypt module enabled?

There are no errors; both the VPN and throughput acceleration work very well.

 

0 Kudos
Ronny_G_Intel
Moderator
215 Views

Hi lpereira,


Thanks for the additional information provided.

I haven't had any luck in my research regarding this issue, I cant find any previous issue involving QAT and pcrypt and not reference to kernel panic issues with having pcrypt and QAT running 

I also checked the link you provided me with https://bugzilla.kernel.org/show_bug.cgi?id=217654 but I am still not clear about the reason to have pcrypt disabled, can you please clarify?


On the other hand, how did you disable pcrypt? Is it disabled at the kernel or at the algorithm level? Can you please share how you did this? Did you use crconf?

As mentioned before, the Linux kernel encrypts and decrypt IPsec packets on a single CPU core only by default. Since 2.6.34 the pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores and you are at kernel version 5.4.113-1.


Regards,

Ronny G


0 Kudos
lpereira
Novice
190 Views

Hi Ronny

 

I haven't had any luck in my research regarding this issue, I cant find any previous issue involving QAT and pcrypt and not reference to kernel panic issues with having pcrypt and QAT running

I also checked the link you provided me with https://bugzilla.kernel.org/show_bug.cgi?id=217654 but I am still not clear about the reason to have pcrypt disabled, can you please clarify?

We followed Steffen Klassert's recommendation; after disabling pcrypt, the problem did not occur again.

De: Steffen Klassert <steffen.klassert@secunet.com>
Enviado: segunda-feira, 18 de dezembro de 2023 08:33
Para: Lucas Vicente Pereira <lpereira@blockbit.com>

Assunto: Re: Linux Kernel Consulting Request - IPSec VPN

Hi Lucas,

I've seen you use pcrypt. Please try without it. The pcrypt
parallelization backend (padata) was reused by some other
subsystem and changed to their needs, so I guess it is a
bug there.

Steffen

---


On the other hand, how did you disable pcrypt? Is it disabled at the kernel or at the algorithm level? Can you please share how you did this? Did you use crconf?

We disabled the module in the Kernel:

`cryptsetup luksOpen /dev/mmcblk0p2 onboot`

mount /dev/mapper/onboot /boot

vi /boot/grub2/grub.cfg

add: pcrypt.blacklist=yes rdblacklist=pcrypt module_blacklist=pcrypt


vi /etc/modprobe.d/blacklist.conf

add: blacklist pcrypt

depmod

dracut -f /boot/initramfs-4.19.12-1.el7.elrepo.x86_64.img

reboot
--


As mentioned before, the Linux kernel encrypts and decrypt IPsec packets on a single CPU core only by default. Since 2.6.34 the pcrypt module (CONFIG_CRYPTO_PCRYPT) allows parallelizing this to all available cores and you are at kernel version 5.4.113-1.

Yes, however, in environments with hundreds of tunnels and heavy traffic, we encountered the bug, which was resolved by disabling it. We conducted throughput tests and, surprisingly, the loss without pcrypt was only 5%.

 

But now, we would like to use the QAT module, but unfortunately, without pcrypt, it causes this kernel panic...

0 Kudos
Ronny_G_Intel
Moderator
23 Views

Hi lpereira,


I am still investigating this issue, thank you for the details provided.

I will get back to you as soon as possible.


Regards,

Ronny G


0 Kudos
lpereira
Novice
17 Views

Thank you, Ronny.

I will keep the laboratory set up, if you want more information or access to the environment, I am at your disposal.

 

Best regards. 

0 Kudos
Reply