Re: ICE kernel module starvation with DPDK and RT kernel - Page 2

fedecive · ‎06-20-2024

Dear Community,

I am working with an application that requires DPDK processes hooked to 2 cores isolated from the OS system.
On my server, I have an Intel E810 card with 2 SFP+ interfaces used for the application. The OS is Ubuntu 20.04 with kernel version 6.2.16 patched for RT. ICE driver is 1.12.7 and FW version is 4.40.

The applications also requires SR-IOV virtualization on the interface enp1s0f0np0 with two different VFs connected and the usage of the interface enp1s0f1np1 without SR-IOV.

After around one hour of running, the kernel complains about the following issue:

[Thu Jun 20 10:48:46 2024] INFO: task kworker/11:1:307 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024]       Tainted: G        W  O       6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/11:1    state:D stack:0     pid:307   ppid:2      flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ice ice_service_task [ice]
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024]  <TASK>
[Thu Jun 20 10:48:46 2024]  __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024]  ? preempt_schedule_thunk+0x1a/0x20
[Thu Jun 20 10:48:46 2024]  schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024]  synchronize_irq+0x7c/0xb0
[Thu Jun 20 10:48:46 2024]  ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jun 20 10:48:46 2024]  ice_vsi_dis_irq+0x17e/0x1a0 [ice]
[Thu Jun 20 10:48:46 2024]  ice_down+0x55/0x2e0 [ice]
[Thu Jun 20 10:48:46 2024]  ice_vsi_close+0xb8/0xc0 [ice]
[Thu Jun 20 10:48:46 2024]  ice_dis_vsi+0x4c/0x80 [ice]
[Thu Jun 20 10:48:46 2024]  ice_pf_dis_all_vsi.constprop.0+0x35/0xf0 [ice]
[Thu Jun 20 10:48:46 2024]  ice_prepare_for_reset+0x1c3/0x420 [ice]
[Thu Jun 20 10:48:46 2024]  ice_do_reset+0x35/0x150 [ice]
[Thu Jun 20 10:48:46 2024]  ice_service_task+0x489/0x19c0 [ice]
[Thu Jun 20 10:48:46 2024]  ? __schedule+0x3c3/0x1650
[Thu Jun 20 10:48:46 2024]  ? psi_avgs_work+0x65/0xd0
[Thu Jun 20 10:48:46 2024]  process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024]  worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024]  ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024]  ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024]  </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/16:1:311 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024]       Tainted: G        W  O       6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/16:1    state:D stack:0     pid:311   ppid:2      flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ipv6_addrconf addrconf_verify_work
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024]  <TASK>
[Thu Jun 20 10:48:46 2024]  __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024]  schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024]  rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024]  __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024]  mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024]  rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024]  addrconf_verify_work+0x16/0x40
[Thu Jun 20 10:48:46 2024]  process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024]  worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024]  ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024]  ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024]  </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/23:1:317 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024]       Tainted: G        W  O       6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/23:1    state:D stack:0     pid:317   ppid:2      flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ipv6_addrconf addrconf_verify_work
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024]  <TASK>
[Thu Jun 20 10:48:46 2024]  __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024]  ? update_load_avg+0x84/0x840
[Thu Jun 20 10:48:46 2024]  schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024]  rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024]  __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024]  mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024]  rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024]  addrconf_verify_work+0x16/0x40
[Thu Jun 20 10:48:46 2024]  process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024]  worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024]  ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024]  ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024]  </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task ptp4l:1677 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024]       Tainted: G        W  O       6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:ptp4l           state:D stack:0     pid:1677  ppid:1676   flags:0x00000002
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024]  <TASK>
[Thu Jun 20 10:48:46 2024]  __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024]  ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024]  ? migrate_enable+0xe7/0x170
[Thu Jun 20 10:48:46 2024]  schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024]  rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024]  __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024]  mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024]  ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024]  rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024]  packet_release+0x12d/0x440
[Thu Jun 20 10:48:46 2024]  ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024]  __sock_release+0x3f/0xc0
[Thu Jun 20 10:48:46 2024]  sock_close+0x1c/0x30
[Thu Jun 20 10:48:46 2024]  __fput+0x93/0x270
[Thu Jun 20 10:48:46 2024]  ____fput+0x12/0x20
[Thu Jun 20 10:48:46 2024]  task_work_run+0x62/0x90
[Thu Jun 20 10:48:46 2024]  exit_to_user_mode_prepare+0x1fc/0x210
[Thu Jun 20 10:48:46 2024]  syscall_exit_to_user_mode+0x20/0x50
[Thu Jun 20 10:48:46 2024]  do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024]  ? syscall_exit_to_user_mode+0x3f/0x50
[Thu Jun 20 10:48:46 2024]  ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024]  ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024]  ? __ct_user_enter+0xc1/0x1a0
[Thu Jun 20 10:48:46 2024]  ? syscall_exit_to_user_mode+0x3f/0x50
[Thu Jun 20 10:48:46 2024]  ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[Thu Jun 20 10:48:46 2024] RIP: 0033:0x7f679ed38f67
[Thu Jun 20 10:48:46 2024] RSP: 002b:00007ffdf0adc6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[Thu Jun 20 10:48:46 2024] RAX: 0000000000000000 RBX: 000055d943853df4 RCX: 00007f679ed38f67
[Thu Jun 20 10:48:46 2024] RDX: 000055d941e0f534 RSI: 000055d943853df4 RDI: 000000000000000e
[Thu Jun 20 10:48:46 2024] RBP: 000055d943853dfc R08: 0000000000000001 R09: 0000000000000000
[Thu Jun 20 10:48:46 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d943853e1c
[Thu Jun 20 10:48:46 2024] R13: 000055d943853df4 R14: 000055d941e109a8 R15: 00007ffdf0adc7f0
[Thu Jun 20 10:48:46 2024]  </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/11:0:3943 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024]       Tainted: G        W  O       6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/11:0    state:D stack:0     pid:3943  ppid:2      flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: events linkwatch_event
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024]  <TASK>
[Thu Jun 20 10:48:46 2024]  __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024]  ? raw_spin_rq_unlock+0x1f/0x70
[Thu Jun 20 10:48:46 2024]  ? rt_mutex_setprio+0x18d/0x4a0
[Thu Jun 20 10:48:46 2024]  schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024]  rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024]  __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024]  mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024]  rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024]  linkwatch_event+0x12/0x40
[Thu Jun 20 10:48:46 2024]  process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024]  worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024]  ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024]  ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024]  ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024]  </TASK>

From the kernel log it seems that the ice module goes to starvation.
Few minutes later, I can also see another problem:

[Thu Jun 20 11:22:31 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout: VSI_num: 14, Q 23, NTC: 0x5, HW_HEAD: 0x5, NTU: 0x6, INT: 0x0
[Thu Jun 20 11:22:31 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery level 3, txqueue 23
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout: VSI_num: 14, Q 23, NTC: 0x5, HW_HEAD: 0x5, NTU: 0x6, INT: 0x0
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery level 4, txqueue 23
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery unsuccessful, device is in unrecoverable state.

After 256 attempts, ICE module fail with the following error and reset

[Thu Jun 20 11:59:18 2024] ice 0000:01:00.1: Rebuild failed, unload and reload driver

The reset does not properly work since the interface enp1s0f1np1 is not detected anymore by the kernel. Indeed, if I issue the command ethtool, it returns an error:

~$ sudo ethtool enp1s0f1np1

netlink error: failed to retrieve link settings
netlink error: Input/output error
netlink error: failed to retrieve link settings
netlink error: Input/output error
Settings for enp1s0f1np1:
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
	Link detected: no

What is going on here, do you have any idea on how to resolve the issue?

Thank you

fedecive · ‎08-01-2024

Hi Amina, Irwan,

after porting the application to DPDK 24.03, ice module v1.14.9, iavf module v4.11.1 and E810 FW version 4.50, I still have starvation issues. Here is what the kernel log prints from time to time:

[ 7668.523842] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 7668.523844] rcu: 	Tasks blocked on level-1 rcu_node (CPUs 0-11): P93/3:b..l
[ 7668.523846] 	(detected by 0, t=15002 jiffies, g=112345, q=133455 ncpus=24)
[ 7668.523848] task:ktimers/7       state:R  running task     stack:0     pid:93    ppid:2      flags:0x00004000
[ 7668.523850] Call Trace:
[ 7668.523852]  <TASK>
[ 7668.523854]  __schedule+0x3bb/0x1650
[ 7668.523858]  ? debug_smp_processor_id+0x1b/0x30
[ 7668.523860]  ? __rcu_irq_enter_check_tick+0x1e/0x1b0
[ 7668.523863]  ? irqentry_exit+0x24/0x50
[ 7668.523864]  ? preempt_schedule_thunk+0x1a/0x20
[ 7668.523866]  ? __pfx_delayed_work_timer_fn+0x10/0x10
[ 7668.523868]  preempt_schedule+0x75/0xd0
[ 7668.523869]  preempt_schedule_thunk+0x1a/0x20
[ 7668.523870]  _raw_spin_unlock_irq+0x5a/0x60
[ 7668.523871]  __run_timers.part.0+0x118/0x370
[ 7668.523873]  ? finish_task_switch.isra.0+0xa5/0x380
[ 7668.523875]  run_timer_softirq+0x43/0xb0
[ 7668.523876]  __do_softirq+0xf6/0x36f
[ 7668.523877]  run_timersd+0x67/0xb0
[ 7668.523879]  smpboot_thread_fn+0x1cf/0x2c0
[ 7668.523880]  ? __pfx_smpboot_thread_fn+0x10/0x10
[ 7668.523881]  kthread+0x11c/0x140
[ 7668.523883]  ? __pfx_kthread+0x10/0x10
[ 7668.523884]  ret_from_fork+0x29/0x50
[ 7668.523886]  </TASK>

[ 8525.356295] INFO: task rcub/1:20 blocked for more than 606 seconds.
[ 8525.356302]       Tainted: G           O       6.2.16-rt3 #3
[ 8525.356304] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8525.356306] task:rcub/1          state:D stack:0     pid:20    ppid:2      flags:0x00004000
[ 8525.356309] Call Trace:
[ 8525.356310]  <TASK>
[ 8525.356313]  __schedule+0x3bb/0x1650
[ 8525.356317]  ? raw_spin_rq_unlock+0x1f/0x70
[ 8525.356320]  ? rt_mutex_setprio+0x18d/0x4a0
[ 8525.356322]  schedule+0x6f/0x120
[ 8525.356324]  rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[ 8525.356325]  __rt_mutex_slowlock.constprop.0+0x83/0x210
[ 8525.356327]  rt_mutex_lock+0x91/0xb0
[ 8525.356328]  rcu_boost_kthread+0xf1/0x410
[ 8525.356330]  ? __pfx_rcu_boost_kthread+0x10/0x10
[ 8525.356332]  kthread+0x11c/0x140
[ 8525.356334]  ? __pfx_kthread+0x10/0x10
[ 8525.356336]  ret_from_fork+0x29/0x50
[ 8525.356339]  </TASK>

It seems it is not related to PTP. Probably DPDK tasks requires 100% CPU and the interrupts are not correctly served.

Do you have any suggestions?

Thank you

Poojitha · ‎08-01-2024

Hello fedecive,

Greetings for the day!

Thank you for sharing the requested details. We kindly request you to allow us some time to check the details and provide you with an update.

Thank you for your understanding!

Regards,

Poojitha N

kharo22 · ‎08-02-2024

Hello @fedecive

Have you tried with last ice driver 1.14.13 ?
my os is also ubuntu 22.04.

I cannot write in SMA1 and SMA2 pins

$ echo 0 2 | /usr/bin/sudo tee /sys/class/net/enp3s0f0np0/device/ptp/ptp1/pins/SMA2

failed

$ echo 0 1 | /usr/bin/sudo tee /sys/class/net/enp3s0f0np0/device/ptp/ptp1/pins/SMA1

Failed

$ dmesg –l err

[wed. juil. 24 14:53:49 2024] driver cannot use function 0 on pin 2

[wed. juil. 24 14:53:49 2024] driver cannot use function 0 on pin 1

thanks

Karim

Ragulan_Intel · ‎09-24-2024

Hi fedecive,

Greetings!

We would appreciate if you can try increasing the priority for ice driver using nice/renice and let us know the outcome.

Using below cmd the priority can be configured for the PIDs that are using the ice driver.

renice -n 20 -p <PID>

Regards,

Ragulan_Intel

Amina_Sadiya · ‎09-30-2024

Hello Team,

Greetings from Intel!

This is the follow-up regarding the issue you reported to us. We are following up to find out if you were able to perform our POA which was shared previously. Please reply to confirm, so we can continue helping on a resolution. Looking forward to receiving your reply.

Regards,

Amina

Intel Server Support

Intel.com/vroc

Amina_Sadiya · ‎10-04-2024

Hello Team,

Thank you for contacting Intel.

This is a follow-up regarding the reported issue. We're eager to ensure a swift resolution and would appreciate any updates or additional information you can provide.

If we don't hear back from you soon, we'll assume the issue has been resolved and will proceed to close the case.

Please feel free to respond to this email at your earliest convenience.

Regards,

Amina

Intel Server Support

Intel.com/vroc

fedecive · ‎10-07-2024

Hi @kharo22,

sorry for the late reply. I did not receive the notification of your answer.

I do not have this kind of error in my setup.
My suggestion is to check the Intel ® Ethernet Network Adapter E810-XXVDA4T User Guide at section 4.1 where it says "The E810 only has one hardware timer shared between all ports. As a result, users find the PTP clock number only on Port 0."
Probably PTP1 is not the correct setup port for the SMA.

kharo22 · ‎10-09-2024

Hello

On my side the driver is creating only on ptp1, ptp0 is ignored

ls /sys/class/ptp/ptp0
clock_name device max_vclocks n_external_timestamps n_programmable_pins power subsystem
dev max_adjustment n_alarms n_periodic_outputs n_vclocks pps_available uevent

ls /sys/class/ptp/ptp1
clock_name device fifo max_vclocks n_external_timestamps n_programmable_pins period power subsystem
dev extts_enable max_adjustment n_alarms n_periodic_outputs n_vclocks pins pps_available uevent

pins folder is only present on ptp1.

I will check

thank you for the clue

fedecive · ‎10-07-2024

Hi Amina,

using the command "renice -n 20 -p <PID>" does not resolve the issue. The problem persists.

I was able to resolve the issue by handling the interrupts of DPDK that probably causes the starvation of the kernel module.

Amina_Sadiya · ‎10-07-2024

Hello Team,

Greetings from Intel!

Thank you for the update! I’m delighted to hear that the issue has been resolved. Please let me know if you need any further assistance from our side.

Feel free to contact us for any assistance.

Regards,

Amina

Amina_Sadiya · ‎10-09-2024

Hi fedecive,

Greetings from Intel!

Since you mentioned that the issue is resolved, could you please confirm if we can proceed with closing the case.

Regards,

Amina

fedecive · ‎10-09-2024

Hi Amina,

at the moment it seems working. In case of need I will open another thread.

Thanks

Amina_Sadiya · ‎10-09-2024

Hi Fedecive,

Thank you for the confirmation, we will go ahead and close the case.

If you have any further queries feel free to contact us.

Regards,

Amina