Embedded Connectivity
Intel network controllers, Firmware, and drivers support systems
867 Discussions

UDP packet loss on Elkhart Lake

sam-bristow-rl
Beginner
1,839 Views

I am trying to configure networking on an Elkhart Lake development board running Linux but are seeing quite high packet loss when sending UDP packets to the system.

Our test setup is two development boards directly connected with a ~1m Ethernet cable. Testing is with a iPerf3 trying to do 900Mbit/sec in a single direction.

After tuning the `net.core.wmem_max` and `net.core.rmem_max` tunables we are able to reduce somewhat, but it's still higher than I'd expect for a point-to-point network.

 

# sysctl net.core | sort
net.core.dev_weight = 64
net.core.dev_weight_rx_bias = 1
net.core.dev_weight_tx_bias = 1
net.core.devconf_inherit_init_net = 0
net.core.fb_tunnels_only_for_init_net = 0
net.core.flow_limit_cpu_bitmap = 0
net.core.flow_limit_table_len = 4096
net.core.gro_normal_batch = 8
net.core.high_order_alloc_disable = 0
net.core.max_skb_frags = 17
net.core.message_burst = 10
net.core.message_cost = 5
net.core.netdev_budget = 300
net.core.netdev_budget_usecs = 8000
net.core.netdev_max_backlog = 10000
net.core.netdev_rss_key = 51:7b:4d:6c:af:ad:f9:5c:07:1e:e6:36:1b:81:31:90:a2:aa:45:3e:3e:ad:c0:df:9e:96:d0:0a:43:f9:ec:c6:2b:43:1f:84:27:3b:20:f8:e9:25:66:12:5e:2f:14:85:29:63:18:3b
net.core.netdev_tstamp_prequeue = 1
net.core.netdev_unregister_timeout_secs = 10
net.core.optmem_max = 20480
net.core.rmem_default = 26214400
net.core.rmem_max = 26214400
net.core.rps_sock_flow_entries = 0
net.core.skb_defer_max = 64
net.core.somaxconn = 4096
net.core.tstamp_allow_data = 1
net.core.txrehash = 1
net.core.warnings = 0
net.core.wmem_default = 26214400
net.core.wmem_max = 26214400

 

 

The remaining packet loss seems to be related to rx fifo overflows in the NIC. 

screenshot-1.png

Changing the driver to only use a single rx queue seems to eliminate the overflows but doesn't it make sense why this would help.

 

ethtool -L enp0s29f1 rx 1

 

 We have reproduced this behaviour in both our custom Buildroot-based image and with an Ubuntu image.

Is there something obvious I'm missing?

--------------

I did find mention of issue EHL22 in https://cdrdv2-public.intel.com/636674/636674_Intel_Atom_Pentium_Celeron_Public_SpecUpdate_rev2p1.pdf relating to a bug with the TX/RXFIFO size. The incorrect FIFO size is showing up on out hardware, but it's not clear if this is the cause of our issue or unrelated.

image-2023-05-31-12-28-41-038.jpeg

I also can't work out where to get mitigation mentioned in that document.

screenshot-2.png

0 Kudos
7 Replies
CarlosAM_INTEL
Moderator
1,811 Views

Hello, @sam-bristow-rl:

Thank you for contacting Intel Embedded Community.

We want to address the following questions to understand this situation:

Could you please clarify if this request is related to the Elkhart Lake (EHL) design developed by you, or is an EHL or a Network Interface Card (NIC) or add-in card developed by a third-party company?

Could you please let us know the name of the manufacturer, the part number, and where we can find the information if this request is related to a third-party design?

We are waiting for your answer.

Best regards,

@CarlosAM_INTEL.

0 Kudos
sam-bristow-rl
Beginner
1,802 Views

Hi Carlos,

 

We have reproduced the issue on two different Elkhart Lake boards from different manufacturers. The main board we have been testing on is the I-PI SMARC Elkhart Lake but we're seeing identical behaviour on the other board too.


Sam

0 Kudos
CarlosAM_INTEL
Moderator
1,772 Views

Hello, @sam-bristow-rl:

Thanks for your update.

Based on the provided information, we need to address the following questions:

Could you please list the Operating Systems (OSs) related to the reported situation?

Is it possible that you can provide the name of the manufacturer and the part number of the other board used to determine the reported condition?

We are waiting for your answer.

Best regards,

@CarlosAM_INTEL.

0 Kudos
sam-bristow-rl
Beginner
1,748 Views

Just to clarify, our test setup is two of the i-Pi SMARC boards connected together. No other boards in the system showing the packet loss.

 

The other board we have tested on is a pre-release sample from a manufacturer who we don't want to publicize at this point.

 

We have reproduced the issue with Fedora 38, Ubuntu 20.04 LTS, Ubuntu Core 22 (Intel Atom® X6000E Series Processors), and our custom Buildroot based OS image running Linux 6.1.26-rt8 kernel. Ubuntu and Fedora are showing about 10x worse packet loss than the Buildroot image.

0 Kudos
CarlosAM_INTEL
Moderator
1,732 Views

Hello, @sam-bristow-rl:

Thanks for your clarification.

You should address your questions stated in this thread as a reference through the channels listed on the following website:

https://ubuntuforums.org/

Best regards,

@CarlosAM_INTEL

0 Kudos
sam-bristow-rl
Beginner
1,719 Views

I'm not sure why the Ubuntu forums would be more likely to have an answer since it looks like a possible cause it the problem with the Intel PSE's embedded processor (EHL22). We've also reproduced the issues on more than just Ubuntu.

 

Can you point me to where I can find the workaround mentioned in the Intel errata EHL22 mentioned in the original post?

0 Kudos
CarlosAM_INTEL
Moderator
1,703 Views

Hello, @sam-bristow-rl:

 

Thanks for your reply.

 

We need to clarify our previous answer.

 

Reviewing the note of the cited workaround, Intel drivers are suggested to avoid implementing the mitigation. Intel's drivers are not guaranteed to work properly on third-party devices (such as the ones used from your side to reproduce the reported situation) because they are generic.

 

Due to this fact, we think that you can contact the developer of the OS to clarify this situation, as a first option.

 

Another option that is not guaranteed to be covered by the workaround note is to request the proper drivers from the developer of the third-party devices that you are using. They can be contacted as a reference through the channels listed on the following website:

https://www.ipi.wiki/community/forum

 

Or, you can find the Intel drivers considering the advice provided in the first part of this communication using the tool stated on the following website:

https://www.intel.com/content/www/us/en/support/detect.html

 

Best regards,

@CarlosAM_INTEL.

0 Kudos
Reply