We are experiencing packet buffer corruption issues on Intel X710 and X550 cards when using hardware multiqueue with AF_XDP and XDP_SHARED_UMEM.
Our product makes use of the AF_XDP feature for moving packets from kernel to a user-space application. We have been exclusively using Intel NICs (X710 and X550). The code is based on Fedora Linux Fedora.
We recently converted our code from using a single RSS Queue (Combined = 1) to multiple queues (Combined = 8).
During testing it was discovered that packets were corrupted on receive when number of queues is greater than one. If the number of queues is set to one, packets are not corrupted. In some cases, the corruption appears to be that the buffer contains all zeros.
This was tested on both the Intel X550 (ixgbe) and Intel X710 (i40e) drivers, with the same failure when XDP_ZEROCOPY is enabled. We discovered that when the mode was forced to XDP_COPY instead of XDP_ZEROCOPY, the corruption is not seen with multiple queues. However we need the performance of XDP_ZEROCOPY for 10Gbps throughput of the application.
We have tried the following to debug the issue:
- Used a single Spinlock to protect Queues in all ParserThreads just to see if it is a a user-space threading issue. No improvment.
- Compiled and tested libbpf-0.7.0, not improvement. Reverted to libbpf-0.4.0, as the AF_XDP functions have been moved to libxdp and deprecated in libbpf-0.7.0.
- Added debug code that sets the buffer contents to 0xff before placing into the Fill queue for the receiver. The failed packets from the receiver have contents of all 0xff. Have sampled a dozen or so packets, and none of the failed packets are from queue 0.
- Not only are the packet contents corrupted, but the RX Descriptor is corrupted as well (from the AF_XDP receive queues), as many of the packets received are of length zero.
- Forced XDP_COPY mode on all sockets in 8 queue mode, and it works! I have packets going both ways through WAN and LAN with no corruption.
- Tested with irqbalance disabled and setting IRQ affinity for a single CPU per queue using Intel script set_irq_affinity.sh. No change.
It is unclear if this is a kernel XDP issue, a i40e and ixgbe driver issue, or a firmware/support issue. At this the assumption is that the driver code, specifically the xdq_queue code, is at fault, as it is common between the ixgbe and i40e drivers.
What we have tried thus far:
- Tried both kernel 5.16.20 and 5.18.5 .
- Intel X710-2
- upgraded to 8.50 firmware
- Updated i40e driver to 2.17.15
- Same issue with Intel X550
- ixgbe driver 5.1.0-k
- Firmware 1.1276.0
Your assistance is appreciated.
Thank you for posting in Intel Ethernet Communities.
Please share the following information that would help in checking your request.
1. Exact model of your X710 and X550 card. You may share photos of the adapter for us to confirm.
2. When did you first encountered the issue?
3. Have you tried latest drivers form our site below?
Intel® Network Adapter Driver for PCIe* 40 Gigabit Ethernet Network Connections under Linux*
Intel® Network Adapter Driver for PCIe* Intel® 10 Gigabit Ethernet Network Connections under Linux*
4. Exact Operating System used
Awaiting to your reply.
We will follow up after 3 business days in case we don't hear from you.
Intel® Customer Support
This is just a follow up for the information requested so we can proceed checking your query. If you need more time on this, please let us know.
Looking forward to your reply.
Should there be no response from you, we will follow up after 3 business days.
Intel® Customer Support
I hope this message finds you well!
Please be informed that we will now close this request since we haven't received any response from our previous follow ups. Just feel free to post a new question if you may have any other inquiry in the future as this thread will no longer be monitored.
Thank you for choosing Intel and stay safe!
Intel Customer Support