- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Community,
I am working with an application that requires DPDK processes hooked to 2 cores isolated from the OS system.
On my server, I have an Intel E810 card with 2 SFP+ interfaces used for the application. The OS is Ubuntu 20.04 with kernel version 6.2.16 patched for RT. ICE driver is 1.12.7 and FW version is 4.40.
The applications also requires SR-IOV virtualization on the interface enp1s0f0np0 with two different VFs connected and the usage of the interface enp1s0f1np1 without SR-IOV.
After around one hour of running, the kernel complains about the following issue:
[Thu Jun 20 10:48:46 2024] INFO: task kworker/11:1:307 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024] Tainted: G W O 6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/11:1 state:D stack:0 pid:307 ppid:2 flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ice ice_service_task [ice]
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024] <TASK>
[Thu Jun 20 10:48:46 2024] __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024] ? preempt_schedule_thunk+0x1a/0x20
[Thu Jun 20 10:48:46 2024] schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024] synchronize_irq+0x7c/0xb0
[Thu Jun 20 10:48:46 2024] ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jun 20 10:48:46 2024] ice_vsi_dis_irq+0x17e/0x1a0 [ice]
[Thu Jun 20 10:48:46 2024] ice_down+0x55/0x2e0 [ice]
[Thu Jun 20 10:48:46 2024] ice_vsi_close+0xb8/0xc0 [ice]
[Thu Jun 20 10:48:46 2024] ice_dis_vsi+0x4c/0x80 [ice]
[Thu Jun 20 10:48:46 2024] ice_pf_dis_all_vsi.constprop.0+0x35/0xf0 [ice]
[Thu Jun 20 10:48:46 2024] ice_prepare_for_reset+0x1c3/0x420 [ice]
[Thu Jun 20 10:48:46 2024] ice_do_reset+0x35/0x150 [ice]
[Thu Jun 20 10:48:46 2024] ice_service_task+0x489/0x19c0 [ice]
[Thu Jun 20 10:48:46 2024] ? __schedule+0x3c3/0x1650
[Thu Jun 20 10:48:46 2024] ? psi_avgs_work+0x65/0xd0
[Thu Jun 20 10:48:46 2024] process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024] worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024] ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024] kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024] ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024] ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024] </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/16:1:311 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024] Tainted: G W O 6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/16:1 state:D stack:0 pid:311 ppid:2 flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ipv6_addrconf addrconf_verify_work
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024] <TASK>
[Thu Jun 20 10:48:46 2024] __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024] schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024] rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024] __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024] mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024] rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024] addrconf_verify_work+0x16/0x40
[Thu Jun 20 10:48:46 2024] process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024] worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024] ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024] kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024] ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024] ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024] </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/23:1:317 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024] Tainted: G W O 6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/23:1 state:D stack:0 pid:317 ppid:2 flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: ipv6_addrconf addrconf_verify_work
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024] <TASK>
[Thu Jun 20 10:48:46 2024] __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024] ? update_load_avg+0x84/0x840
[Thu Jun 20 10:48:46 2024] schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024] rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024] __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024] mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024] rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024] addrconf_verify_work+0x16/0x40
[Thu Jun 20 10:48:46 2024] process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024] worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024] ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024] kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024] ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024] ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024] </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task ptp4l:1677 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024] Tainted: G W O 6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:ptp4l state:D stack:0 pid:1677 ppid:1676 flags:0x00000002
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024] <TASK>
[Thu Jun 20 10:48:46 2024] __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024] ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024] ? migrate_enable+0xe7/0x170
[Thu Jun 20 10:48:46 2024] schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024] rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024] __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024] mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024] ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024] rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024] packet_release+0x12d/0x440
[Thu Jun 20 10:48:46 2024] ? debug_smp_processor_id+0x1b/0x30
[Thu Jun 20 10:48:46 2024] __sock_release+0x3f/0xc0
[Thu Jun 20 10:48:46 2024] sock_close+0x1c/0x30
[Thu Jun 20 10:48:46 2024] __fput+0x93/0x270
[Thu Jun 20 10:48:46 2024] ____fput+0x12/0x20
[Thu Jun 20 10:48:46 2024] task_work_run+0x62/0x90
[Thu Jun 20 10:48:46 2024] exit_to_user_mode_prepare+0x1fc/0x210
[Thu Jun 20 10:48:46 2024] syscall_exit_to_user_mode+0x20/0x50
[Thu Jun 20 10:48:46 2024] do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024] ? syscall_exit_to_user_mode+0x3f/0x50
[Thu Jun 20 10:48:46 2024] ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024] ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024] ? __ct_user_enter+0xc1/0x1a0
[Thu Jun 20 10:48:46 2024] ? syscall_exit_to_user_mode+0x3f/0x50
[Thu Jun 20 10:48:46 2024] ? do_syscall_64+0x6d/0x90
[Thu Jun 20 10:48:46 2024] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[Thu Jun 20 10:48:46 2024] RIP: 0033:0x7f679ed38f67
[Thu Jun 20 10:48:46 2024] RSP: 002b:00007ffdf0adc6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[Thu Jun 20 10:48:46 2024] RAX: 0000000000000000 RBX: 000055d943853df4 RCX: 00007f679ed38f67
[Thu Jun 20 10:48:46 2024] RDX: 000055d941e0f534 RSI: 000055d943853df4 RDI: 000000000000000e
[Thu Jun 20 10:48:46 2024] RBP: 000055d943853dfc R08: 0000000000000001 R09: 0000000000000000
[Thu Jun 20 10:48:46 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d943853e1c
[Thu Jun 20 10:48:46 2024] R13: 000055d943853df4 R14: 000055d941e109a8 R15: 00007ffdf0adc7f0
[Thu Jun 20 10:48:46 2024] </TASK>
[Thu Jun 20 10:48:46 2024] INFO: task kworker/11:0:3943 blocked for more than 606 seconds.
[Thu Jun 20 10:48:46 2024] Tainted: G W O 6.2.16-rt3 #3
[Thu Jun 20 10:48:46 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jun 20 10:48:46 2024] task:kworker/11:0 state:D stack:0 pid:3943 ppid:2 flags:0x00004000
[Thu Jun 20 10:48:46 2024] Workqueue: events linkwatch_event
[Thu Jun 20 10:48:46 2024] Call Trace:
[Thu Jun 20 10:48:46 2024] <TASK>
[Thu Jun 20 10:48:46 2024] __schedule+0x3bb/0x1650
[Thu Jun 20 10:48:46 2024] ? raw_spin_rq_unlock+0x1f/0x70
[Thu Jun 20 10:48:46 2024] ? rt_mutex_setprio+0x18d/0x4a0
[Thu Jun 20 10:48:46 2024] schedule+0x6f/0x120
[Thu Jun 20 10:48:46 2024] rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[Thu Jun 20 10:48:46 2024] __rt_mutex_slowlock.constprop.0+0x83/0x210
[Thu Jun 20 10:48:46 2024] mutex_lock+0x91/0xb0
[Thu Jun 20 10:48:46 2024] rtnl_lock+0x19/0x20
[Thu Jun 20 10:48:46 2024] linkwatch_event+0x12/0x40
[Thu Jun 20 10:48:46 2024] process_one_work+0x21c/0x490
[Thu Jun 20 10:48:46 2024] worker_thread+0x54/0x3e0
[Thu Jun 20 10:48:46 2024] ? __pfx_worker_thread+0x10/0x10
[Thu Jun 20 10:48:46 2024] kthread+0x11c/0x140
[Thu Jun 20 10:48:46 2024] ? __pfx_kthread+0x10/0x10
[Thu Jun 20 10:48:46 2024] ret_from_fork+0x29/0x50
[Thu Jun 20 10:48:46 2024] </TASK>
From the kernel log it seems that the ice module goes to starvation.
Few minutes later, I can also see another problem:
[Thu Jun 20 11:22:31 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout: VSI_num: 14, Q 23, NTC: 0x5, HW_HEAD: 0x5, NTU: 0x6, INT: 0x0
[Thu Jun 20 11:22:31 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery level 3, txqueue 23
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout: VSI_num: 14, Q 23, NTC: 0x5, HW_HEAD: 0x5, NTU: 0x6, INT: 0x0
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery level 4, txqueue 23
[Thu Jun 20 11:22:37 2024] ice 0000:01:00.1 enp1s0f1np1: tx_timeout recovery unsuccessful, device is in unrecoverable state.
After 256 attempts, ICE module fail with the following error and reset
[Thu Jun 20 11:59:18 2024] ice 0000:01:00.1: Rebuild failed, unload and reload driver
The reset does not properly work since the interface enp1s0f1np1 is not detected anymore by the kernel. Indeed, if I issue the command ethtool, it returns an error:
~$ sudo ethtool enp1s0f1np1
netlink error: failed to retrieve link settings
netlink error: Input/output error
netlink error: failed to retrieve link settings
netlink error: Input/output error
Settings for enp1s0f1np1:
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
What is going on here, do you have any idea on how to resolve the issue?
Thank you
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amina, Irwan,
after porting the application to DPDK 24.03, ice module v1.14.9, iavf module v4.11.1 and E810 FW version 4.50, I still have starvation issues. Here is what the kernel log prints from time to time:
[ 7668.523842] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 7668.523844] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-11): P93/3:b..l
[ 7668.523846] (detected by 0, t=15002 jiffies, g=112345, q=133455 ncpus=24)
[ 7668.523848] task:ktimers/7 state:R running task stack:0 pid:93 ppid:2 flags:0x00004000
[ 7668.523850] Call Trace:
[ 7668.523852] <TASK>
[ 7668.523854] __schedule+0x3bb/0x1650
[ 7668.523858] ? debug_smp_processor_id+0x1b/0x30
[ 7668.523860] ? __rcu_irq_enter_check_tick+0x1e/0x1b0
[ 7668.523863] ? irqentry_exit+0x24/0x50
[ 7668.523864] ? preempt_schedule_thunk+0x1a/0x20
[ 7668.523866] ? __pfx_delayed_work_timer_fn+0x10/0x10
[ 7668.523868] preempt_schedule+0x75/0xd0
[ 7668.523869] preempt_schedule_thunk+0x1a/0x20
[ 7668.523870] _raw_spin_unlock_irq+0x5a/0x60
[ 7668.523871] __run_timers.part.0+0x118/0x370
[ 7668.523873] ? finish_task_switch.isra.0+0xa5/0x380
[ 7668.523875] run_timer_softirq+0x43/0xb0
[ 7668.523876] __do_softirq+0xf6/0x36f
[ 7668.523877] run_timersd+0x67/0xb0
[ 7668.523879] smpboot_thread_fn+0x1cf/0x2c0
[ 7668.523880] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 7668.523881] kthread+0x11c/0x140
[ 7668.523883] ? __pfx_kthread+0x10/0x10
[ 7668.523884] ret_from_fork+0x29/0x50
[ 7668.523886] </TASK>
[ 8525.356295] INFO: task rcub/1:20 blocked for more than 606 seconds.
[ 8525.356302] Tainted: G O 6.2.16-rt3 #3
[ 8525.356304] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8525.356306] task:rcub/1 state:D stack:0 pid:20 ppid:2 flags:0x00004000
[ 8525.356309] Call Trace:
[ 8525.356310] <TASK>
[ 8525.356313] __schedule+0x3bb/0x1650
[ 8525.356317] ? raw_spin_rq_unlock+0x1f/0x70
[ 8525.356320] ? rt_mutex_setprio+0x18d/0x4a0
[ 8525.356322] schedule+0x6f/0x120
[ 8525.356324] rt_mutex_slowlock_block.constprop.0+0x3a/0x190
[ 8525.356325] __rt_mutex_slowlock.constprop.0+0x83/0x210
[ 8525.356327] rt_mutex_lock+0x91/0xb0
[ 8525.356328] rcu_boost_kthread+0xf1/0x410
[ 8525.356330] ? __pfx_rcu_boost_kthread+0x10/0x10
[ 8525.356332] kthread+0x11c/0x140
[ 8525.356334] ? __pfx_kthread+0x10/0x10
[ 8525.356336] ret_from_fork+0x29/0x50
[ 8525.356339] </TASK>
It seems it is not related to PTP. Probably DPDK tasks requires 100% CPU and the interrupts are not correctly served.
Do you have any suggestions?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello fedecive,
Greetings for the day!
Thank you for sharing the requested details. We kindly request you to allow us some time to check the details and provide you with an update.
Thank you for your understanding!
Regards,
Poojitha N
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @fedecive
Have you tried with last ice driver 1.14.13 ?
my os is also ubuntu 22.04.
I cannot write in SMA1 and SMA2 pins
$ echo 0 2 | /usr/bin/sudo tee /sys/class/net/enp3s0f0np0/device/ptp/ptp1/pins/SMA2
failed
$ echo 0 1 | /usr/bin/sudo tee /sys/class/net/enp3s0f0np0/device/ptp/ptp1/pins/SMA1
Failed
$ dmesg –l err
[wed. juil. 24 14:53:49 2024] driver cannot use function 0 on pin 2
[wed. juil. 24 14:53:49 2024] driver cannot use function 0 on pin 1
thanks
Karim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi fedecive,
Greetings!
We would appreciate if you can try increasing the priority for ice driver using nice/renice and let us know the outcome.
Using below cmd the priority can be configured for the PIDs that are using the ice driver.
renice -n 20 -p <PID>
Regards,
Ragulan_Intel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
Greetings from Intel!
This is the follow-up regarding the issue you reported to us. We are following up to find out if you were able to perform our POA which was shared previously. Please reply to confirm, so we can continue helping on a resolution. Looking forward to receiving your reply.
Regards,
Amina
Intel Server Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
Thank you for contacting Intel.
This is a follow-up regarding the reported issue. We're eager to ensure a swift resolution and would appreciate any updates or additional information you can provide.
If we don't hear back from you soon, we'll assume the issue has been resolved and will proceed to close the case.
Please feel free to respond to this email at your earliest convenience.
Regards,
Amina
Intel Server Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kharo22,
sorry for the late reply. I did not receive the notification of your answer.
I do not have this kind of error in my setup.
My suggestion is to check the Intel ® Ethernet Network Adapter E810-XXVDA4T User Guide at section 4.1 where it says "The E810 only has one hardware timer shared between all ports. As a result, users find the PTP clock number only on Port 0."
Probably PTP1 is not the correct setup port for the SMA.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
On my side the driver is creating only on ptp1, ptp0 is ignored
ls /sys/class/ptp/ptp0
clock_name device max_vclocks n_external_timestamps n_programmable_pins power subsystem
dev max_adjustment n_alarms n_periodic_outputs n_vclocks pps_available uevent
ls /sys/class/ptp/ptp1
clock_name device fifo max_vclocks n_external_timestamps n_programmable_pins period power subsystem
dev extts_enable max_adjustment n_alarms n_periodic_outputs n_vclocks pins pps_available uevent
pins folder is only present on ptp1.
I will check
thank you for the clue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amina,
using the command "renice -n 20 -p <PID>" does not resolve the issue. The problem persists.
I was able to resolve the issue by handling the interrupts of DPDK that probably causes the starvation of the kernel module.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
Greetings from Intel!
Thank you for the update! I’m delighted to hear that the issue has been resolved. Please let me know if you need any further assistance from our side.
Feel free to contact us for any assistance.
Regards,
Amina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi fedecive,
Greetings from Intel!
Since you mentioned that the issue is resolved, could you please confirm if we can proceed with closing the case.
Regards,
Amina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amina,
at the moment it seems working. In case of need I will open another thread.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Fedecive,
Thank you for the confirmation, we will go ahead and close the case.
If you have any further queries feel free to contact us.
Regards,
Amina

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »