Intel® SoC FPGA Embedded Development Suite
Support for SoC FPGA Software Development, SoC FPGA HPS Architecture, HPS SoC Boot and Configuration, Operating Systems
567 Discussions

High-Latency Ethernet on Arria 10 SoC device using HPS EMAC and KSZ9031

Silvan
Novice
2,978 Views

Hello,

I observe a strange issue using the 1Gbps Ethernet on the Arria 10 SoC HPS. I observe the issue on our custom board and also on the Arria 10 Dev Kit. For simpler debugging, explanation and potential reproducibility, I focus now on the Dev Kit: https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/arria/10-sx.html

 

 

I built the Kernel 6.6 for the development board and use the tool `iperf3` for an Ethernet stress test. The setup is simple: Direct connection from the DevKit to a Computer. On the Dev Kit I start the "Server" with `iperf3 -s` and on the PC side, I call the client in bidirectional mode like this: `iperf3 -c <IP> --bidir`.

 

In parallel I start a ping on my PC console to observe the Ethernet latency. Every now and than, the ping latency increase to 500-1000ms (after the iperf3 test has finished and no data transfer is done).

It is really sporadic and doesn't happen every time. But it is some how a big issue on our product using the Arria 10 device.

 

I think it is some how related with an old observation: https://community.intel.com/t5/Intel-SoC-FPGA-Embedded/is-there-a-bug-in-the-Ethernet-MAC-of-Arria10-SoC-devices/m-p/1238772#M866

At this time now specific answer for the issue was found. And now the issue is back after the kernel update.

 

So whats happen here? And how can we get a stable Ethernet connection? Is there any patches or bugfixes available for the EMAC driver? 

 

Cheers

Silvan

 

0 Kudos
11 Replies
Girisha_Dengi
Employee
2,880 Views

Hello Silvan,

 

We are looking into this issue details and will get back to you on this.

Thank you.

 

Regards,

Girisha Dengi

0 Kudos
Girisha_Dengi
Employee
2,712 Views

Hello Silvan,

 

Altera has recently upgraded the Linux kernel version from 6.6 to 6.12.11 LTS as part of our previous quarter release.

Please upgrade the Linux kernel to this version 6.12.11 LTS and test your use case. 

Link: https://github.com/altera-fpga/linux-socfpga.git

 

Thank you.

 

Regards,

Girisha Dengi

0 Kudos
Jeet14
Employee
2,548 Views

Hi Silvan,


Please let us know if there is any further query on this?


Regards

Tiwari


0 Kudos
Silvan
Novice
2,345 Views

Hi Girisha Dengi,

Hi Tiwari

 

I will do the test with 6.12 on Monday. In the mean time I found a few more informations.

It seams, that the issue is not directly a delay in processing the ethernet frames. It is more like an offset in the element processing.

 

I check the network packages in Wireshark. In the error case, I can send a first (single) ping package and wait for an answer.

As long as nothing else is sent over the Ethernet no answer is retuned. As soon as I send a second ping package i receive the answer from the first ping package on my host PC.

 

When I observe the tcpdump on the Arria 10 device, I see the first ping package and its response after sending the second one.

 

 

Based on this observation I tried to get more information about the HPS MAC status. The gmacgrp_debug register  https://www.intel.com/content/www/us/en/programmable/hps/arria-10/hps.html#topic/sfo1429889363845.html reports 0x120 which means, that data are present in the Rx FIFO and that the read controller is in the state of "Reading Frame Data".

 

It seams, that it is some how stuck in this state, and no Data are transferred to the kernel descriptors. (I checked as well the descriptors in kernel, and the are all provided and ready to use for the Ethernet DMA)

 

So in my opinion it seams that MACs FIFO read controller is somehow out of sync and still waiting for the end of a frame to be triggered to transfer the data... The operation mode is "Store and forward" which some how waits for the frame end.

 

Is there a way to get more information of the read controller? And to trigger the read out manually? I think the issue is not necessary related with kernel version. I will do the test with 6.12 next week.

 

regards,

silvan

0 Kudos
Silvan
Novice
1,951 Views

Hi I did some tests with kernel 6.12 and see the same issue. I still think the issue is some where in the Read Controller of the Rx FIFO inside the MAC IP. 

 

Is there any further registers for debugging available? Or any state information which can help to find and solve the issue?

 

Regards,

Silvan

0 Kudos
MattG_Altera
Employee
1,914 Views

Hi Silvan,

 

When you get the devkit "stuck" waiting for a receive, try issuing the command, ethtool -d <eth_interface>, to get a dump of all the registers. This might provide deeper insight into the problem.

 

An experiment worth trying is to change the size of the ICMP packets sent by the ping command with the -s option. I suggest trying values of 128, 129, 130, and 131 to explore alignment as being part of the problem.

 

Another experiment to try is to enable "busy polling" every millisecond with the following command:

# echo 1000 > /proc/sys/net/core/busy_poll

The value of the parameter is microseconds; so, I would expect the RX path would get unstuck after 1 millisecond.

 

Regards,

Matthew

 

0 Kudos
Silvan
Novice
1,748 Views

Hi Matthew,

 

Here, I can provide the additional information: First the Register values which I got with ethtool -d eth0 in the error state:

root@arria10:~# ethtool -d eth0
ST GMAC Registers
GMAC Registers
Reg0  0x00610C0C
Reg1  0x00000404
Reg2  0x00000000
Reg3  0x00000000
Reg4  0x00003A90
Reg5  0x00003C00
Reg6  0xFFFF000E
Reg7  0x00000000
Reg8  0x00001037
Reg9  0x00000120
Reg10  0x00000000
Reg11  0x00000000
Reg12  0x00020000
Reg13  0x03E80000
Reg14  0x00000001
Reg15  0x00000201
Reg16  0x80008EAC
Reg17  0x9242062E
Reg18  0x00000000
Reg19  0x00000000
Reg20  0x00000000
Reg21  0x00000000
Reg22  0x00000000
Reg23  0x00000000
Reg24  0x00000000
Reg25  0x00000000
Reg26  0x00000000
Reg27  0x00000000
Reg28  0x00000000
Reg29  0x00000000
Reg30  0x00000000
Reg31  0x00000000
Reg32  0x00000000
Reg33  0x00000000
Reg34  0x00000000
Reg35  0x00000000
Reg36  0x00000000
Reg37  0x00000000
Reg38  0x00000000
Reg39  0x00000000
Reg40  0x00000000
Reg41  0x00000000
Reg42  0x00000000
Reg43  0x00000000
Reg44  0x00000000
Reg45  0x00000000
Reg46  0x00000000
Reg47  0x00000000
Reg48  0x00000000
Reg49  0x00000000
Reg50  0x00000000
Reg51  0x00000000
Reg52  0x00000000
Reg53  0x00000000
Reg54  0x0000000D

DMA Registers
Reg0  0x01900880
Reg1  0x00000000
Reg2  0x00000000
Reg3  0x027B0000
Reg4  0x027B8000
Reg5  0x00660404
Reg6  0x02202906
Reg7  0x0001A061
Reg8  0x000000B7
Reg9  0x00000000
Reg10  0x00FF0009
Reg11  0x00000000
Reg12  0x00000000
Reg13  0x00000000
Reg14  0x00000000
Reg15  0x00000000
Reg16  0x00000000
Reg17  0x00000000
Reg18  0x027B9FE0
Reg19  0x027B3DC0
Reg20  0x030047CA
Reg21  0x02E0F000
Reg22  0x170D69BF
root@arria10:~#

 

I did the tests with the ping message size (-s option). It has no impact or changes in the behavior. On my host PC I started a tcpdump which results in the following output:

11:59:55.158944 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21008, seq 1, length 136
11:59:55.159162 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 20934, seq 1, length 64
12:00:02.411714 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21139, seq 1, length 137
12:00:02.412154 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21008, seq 1, length 136
12:00:09.874869 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21356, seq 1, length 138
12:00:09.875066 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21139, seq 1, length 137
12:00:27.903913 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21687, seq 1, length 139
12:00:27.904126 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21356, seq 1, length 138
12:00:32.869880 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21766, seq 1, length 64
12:00:32.870074 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21687, seq 1, length 139

 

The reply is always that one from the previous request, not from the current one. It is independent of the time, when the new request was sent.

 

In parallel I executed tcpdump also on the Arria 10 device:

08:03:56.351322 IP 192.168.2.201 > arria10: ICMP echo request, id 20934, seq 1, length 64
08:03:56.351389 IP arria10 > 192.168.2.201: ICMP echo reply, id 20934, seq 1, length 64
08:04:03.604107 IP 192.168.2.201 > arria10: ICMP echo request, id 21008, seq 1, length 136
08:04:03.604171 IP arria10 > 192.168.2.201: ICMP echo reply, id 21008, seq 1, length 136
08:04:11.067233 IP 192.168.2.201 > arria10: ICMP echo request, id 21139, seq 1, length 137
08:04:11.067297 IP arria10 > 192.168.2.201: ICMP echo reply, id 21139, seq 1, length 137
08:04:29.096255 IP 192.168.2.201 > arria10: ICMP echo request, id 21356, seq 1, length 138
08:04:29.096331 IP arria10 > 192.168.2.201: ICMP echo reply, id 21356, seq 1, length 138
08:04:34.062204 IP 192.168.2.201 > arria10: ICMP echo request, id 21687, seq 1, length 139
08:04:34.062267 IP arria10 > 192.168.2.201: ICMP echo reply, id 21687, seq 1, length 139

It seams, that the Arria10 device get the ping messages and immediately reply to them. But we can see the package offset. The Arria 10 device gets the "old" 64bytes long messages and reply to them where the host was sending the 136bytes message.

 

Based on that observation and in combination with the information of the Register 9 (gmacgrp_debug), it seams, that the offset source is in the Rx path. It seams, that the Rx FIFO readout controller is responsible for this offset. Do you know any Registers to get more information about the Readout controller of the MAC?

 

I think, in the current case, there is no option on OS level to influence this directly, because the offset is down in the MAC hardware.

I tried also the "polling" configuration which you suggested and additional I set them also for  /proc/sys/net/core/busy_read. Both of them doesn't changed anything.

When we think, that it is probably an issue in the FIFO readout controller, I would also expect no behavior change based on the polling setting. This because, the polling setting works on the "Descriptors" in the kernel memory and not the FIFO state.

 

Any idea how we can solve the issue? Or how we can get more information about the root cause? Maybe it is possible to trigger the FIFO read controller manually? To get them back in sync?

 

Thanks you for your support and best regards,

Silvan

0 Kudos
rgthomas
Employee
1,548 Views

Hi Silvan,

I think I was able to reproduce the issue on our setup consistently, just by running the iperf3 server with UDP 64-byte packets sent from the host machine at line rate.

root@arria10:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.2.100, port 37872
[  5] local 192.168.2.40 port 5201 connected to 192.168.2.100 port 57154
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  6.25 MBytes  52.4 Mbits/sec  0.006 ms  154296/256751 (60%)  
[  5]   1.00-2.00   sec  6.27 MBytes  52.6 Mbits/sec  0.006 ms  156917/259573 (60%)  
[  5]   2.00-3.00   sec  6.26 MBytes  52.5 Mbits/sec  0.007 ms  157229/259797 (61%)  
[  5]   3.00-4.00   sec  6.26 MBytes  52.5 Mbits/sec  0.006 ms  157367/259983 (61%)  
[  5]   4.00-5.00   sec  6.26 MBytes  52.5 Mbits/sec  0.006 ms  157039/259623 (60%)  
[  5]   5.00-6.00   sec  6.25 MBytes  52.4 Mbits/sec  0.009 ms  158735/261174 (61%)  
[  5]   6.00-7.00   sec  6.25 MBytes  52.5 Mbits/sec  0.012 ms  157964/260420 (61%)  
[  5]   7.00-8.00   sec  6.26 MBytes  52.5 Mbits/sec  0.006 ms  162000/264503 (61%)  
[  5]   8.00-9.00   sec  6.26 MBytes  52.5 Mbits/sec  0.011 ms  162404/265020 (61%)  
[  5]   9.00-10.00  sec  6.27 MBytes  52.6 Mbits/sec  0.006 ms  162334/265030 (61%)  
[  5]  10.00-10.42  sec  22.8 KBytes   445 Kbits/sec  12.748 ms  611/976 (63%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.42  sec  62.6 MBytes  50.4 Mbits/sec  12.748 ms  1586896/2612850 (61%)  receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
root@arria10:~# ping 192.168.2.100 -c 10
PING 192.168.2.100 (192.168.2.100): 56 data bytes
64 bytes from 192.168.2.100: seq=0 ttl=64 time=1000.462 ms
64 bytes from 192.168.2.100: seq=1 ttl=64 time=1000.344 ms
64 bytes from 192.168.2.100: seq=2 ttl=64 time=1000.377 ms
64 bytes from 192.168.2.100: seq=3 ttl=64 time=1000.363 ms
64 bytes from 192.168.2.100: seq=4 ttl=64 time=1000.367 ms
64 bytes from 192.168.2.100: seq=5 ttl=64 time=1000.374 ms
64 bytes from 192.168.2.100: seq=6 ttl=64 time=1000.347 ms
64 bytes from 192.168.2.100: seq=7 ttl=64 time=1000.349 ms
64 bytes from 192.168.2.100: seq=8 ttl=64 time=1000.363 ms

--- 192.168.2.100 ping statistics ---
10 packets transmitted, 9 packets received, 10% packet loss
round-trip min/avg/max = 1000.344/1000.371/1000.462 ms

As you have pointed out earlier in this thread, the gmacgrp_debug register has a value of 0x00000120 at my setup also. With this, I think there is an issue with the EMAC IP on Arria 10, and the issue seems reproducible when we hit the Rx buffer unavailable condition repeatedly. We need to follow up with Synopsis on this issue.

 

I tried a different configuration for the MTL RX FIFO and Rx DMA, and I can see the issue is not reproducible in this case. What I've done is set the dff bit of the dmagrp_operation_mode register. By setting this bit, we can disable flushing of the received frames when the Rx buffer unavailable condition is met. But in this setting, we need to write to the  dmagrp_receive_poll_demand register after refilling the descriptor ring. Attached a patch for these changes. With these changes, I can see flow control working effectively and with almost 0% packet loss for the same iperf3 test. Also, the issue is not reproducible.

root@arria10:~# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.2.100, port 43988
[  5] local 192.168.2.25 port 5201 connected to 192.168.2.100 port 57459
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  6.52 MBytes  54.6 Mbits/sec  0.007 ms  858/107663 (0.8%)  
[  5]   1.00-2.00   sec  6.58 MBytes  55.2 Mbits/sec  0.007 ms  0/107776 (0%)  
[  5]   2.00-3.00   sec  6.50 MBytes  54.5 Mbits/sec  0.012 ms  694/107232 (0.65%)  
[  5]   3.00-4.00   sec  6.57 MBytes  55.1 Mbits/sec  0.007 ms  0/107688 (0%)  
[  5]   4.00-5.00   sec  6.55 MBytes  55.0 Mbits/sec  0.007 ms  0/107392 (0%)  
[  5]   5.00-6.00   sec  6.55 MBytes  55.0 Mbits/sec  0.018 ms  0/107343 (0%)  
[  5]   6.00-7.00   sec  6.56 MBytes  55.0 Mbits/sec  0.006 ms  0/107448 (0%)  
[  5]   7.00-8.00   sec  6.56 MBytes  55.1 Mbits/sec  0.005 ms  0/107536 (0%)  
[  5]   8.00-9.00   sec  6.55 MBytes  55.0 Mbits/sec  0.008 ms  0/107344 (0%)  
[  5]   9.00-10.00  sec  6.56 MBytes  55.0 Mbits/sec  0.027 ms  0/107488 (0%)  
[  5]  10.00-10.01  sec  86.6 KBytes  54.1 Mbits/sec  0.008 ms  0/1386 (0%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.01  sec  65.6 MBytes  54.9 Mbits/sec  0.008 ms  1552/1076296 (0.14%)  receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
root@arria10:~# ping 192.168.10.2.100 -c 10
PING 192.168.2.100 (192.168.2.100): 56 data bytes
64 bytes from 192.168.2.100: seq=0 ttl=64 time=0.336 ms
64 bytes from 192.168.2.100: seq=1 ttl=64 time=0.315 ms
64 bytes from 192.168.2.100: seq=2 ttl=64 time=0.294 ms
64 bytes from 192.168.2.100: seq=3 ttl=64 time=0.286 ms
64 bytes from 192.168.2.100: seq=4 ttl=64 time=0.243 ms
64 bytes from 192.168.2.100: seq=5 ttl=64 time=0.237 ms
64 bytes from 192.168.2.100: seq=6 ttl=64 time=0.317 ms
64 bytes from 192.168.2.100: seq=7 ttl=64 time=0.276 ms
64 bytes from 192.168.2.100: seq=8 ttl=64 time=0.249 ms
64 bytes from 192.168.2.100: seq=9 ttl=64 time=0.254 ms

--- 192.168.2.100 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.237/0.280/0.336 ms

 Attached patch for these changes. This patch is for checking the behavior on your side and is on socfpga-6.12.11-lts branch. This patch is not fully tested. Once booted with the image having the above patch you can use devmem2 tool to make sure dff bit is set for the dmagrp_operation_mode register and after that you can repeat the above test and can confirm issue is reproducible or not. Atleast on my side I can see the issue is not reproducible.

root@arria10:~# devmem2 0xFF801018
/dev/mem opened.
Memory mapped at address 0xb6fad000.
Read at address  0xFF801018 (0xb6fad018): 0x03202906

Please try and confirm the behavior on your setup.

 

Best Regards,

Rohan

0 Kudos
Silvan
Novice
1,006 Views

Hi Rohan,

 

Thank you so much for the description and the provided patch. I applied the patch for the dev board and it works as expected.

Additionally, I created the same patch for kernel 6.6 and a first short test on our hardware was also successful. I will run more detailed tests on our hardware this week. But I am optimistic that it will work.

 

For us it would be beneficial if the patch becomes available in the mainline version of the Linux kernel. Do you think, this could be possible? If required, I will support you for that.

 

For the discussion with Synopsis I have a few additional info:

We have also a old Cyclone V Dev Board. As I saw in the device documentation, there is a similar MAC. Version 3.70a in Cyclone V and 3.72a in Arria 10.

I did the same test on the Cyclone V and observed the same issue as well. 

0 Kudos
rgthomas
Employee
761 Views

Hi Silvan,

Thanks for sharing the update — glad to hear the patch resolves the issue. It seems the EMACs in Cyclone V, Arria 10, and Agilex7 exhibit this behavior. We'll follow up with Synopsys for further clarification.

We're planning to integrate the workaround in our next quarterly release. If you're considering upstreaming it to the mainline Linux kernel, you're welcome to do so. Just make sure to update the commit subject and description according to the upstream guidelines (currently, commit message is written as a workaround), and also please ensure the patch adheres to kernel coding standards.

Also, please let us know if you observe any regressions during your testing.

Best regards,
Rohan

0 Kudos
JingyangTeh_Altera
284 Views

Hi


I’m glad that your question has been addressed, I now transition this thread to community support. If you have a new question, Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.


Regards

Jingyang, Teh


0 Kudos
Reply