Intel® SoC FPGA Embedded Development Suite
Support for SoC FPGA Software Development, SoC FPGA HPS Architecture, HPS SoC Boot and Configuration, Operating Systems
444 Discussions

is there a bug in the Ethernet MAC of Arria10 SoC devices?

Silvan
Novice
4,497 Views

Hi all,

I have an Arria10 SoC device on which an embedded Linux (4.14) is executed. It use the EMAC1 and an external Phy (Micrel KSZ9031RNX) connected through RGMII .

After the transfer of a huge amount of data from the SoC to a PC through the Gigabit Ethernet interface the following observations can be made:

  • Large communication latency. A device ping require around 1s
  • The EMAC1 gmacgrp_debug register (0xFF802024) has a value of 0x120

It seams, that something is wrong with the FIFO state in the MAC or with the FIFO flush mechanism?

The data transfer was done through the iperf3 tool with the PC as server and the SoC as client.

Does anyone know about an issue in the Ethernet drivers (MAC or PHY) when a lot of data is transferred? Or what could be the next step to solve the issue?

By the way: The high latency error state is exited by reinitialize the network connection in the SoC device with ifconfig eth0 down and ifconfig eth0 up 

Thanks for any hints and proposals for a solution or debugging hints

0 Kudos
19 Replies
EBERLAZARE_I_Intel
4,422 Views

Hi,

I have not seen such issues occurs before. Did you change or modify any Uboot or devce tree configs?

Also, you mentioned you are using EMAC1, but ifconfig eth0 down and ifconfig eth0 up to exit the high latency issue. Why are you doing it on the eth0 instead of eth1?

Can you check if you have set the correct EMAC* in the device tree:

"Ubootdirectory\arch\arm\dts"

0 Kudos
Silvan
Novice
4,405 Views

Hi Eberlazare,

Thanks for your feedback.

 

This issue is not observed in u-boot but on a running Linux. Our hardware has only one Ethernet connection which use EMAC1 (EMAC0 is disabled). In the Linux device tree the EMAC1 configuration is connected to ethernet0 (aliases section) which result in the naming eth0 on Linux.

 

Bellow you find the Linux device tree entry for the ethernet configuration.

Do you think there is something wrong? Or what could be the reason for this high Ethernet latency?

 

hps_gmac: ethernet@ff802000 {
	compatible = "altr,socfpga-stmmac", "snps,dwmac-3.72a", "snps,dwmac";
	altr,sysmgr-syscon = <&sysmgr 0x48 8>;
	reg = <0xff802000 0x2000>;
	interrupts = <0 93 4>;
	interrupt-names = "macirq";
	/* Filled in by bootloader */
	mac-address = [00 00 00 00 00 00];
	snps,multicast-filter-bins = <256>;
	snps,perfect-filter-entries = <128>;
	snps,axi-config = <&socfpga_axi_setup>;
	tx-fifo-depth = <4096>;
	rx-fifo-depth = <16384>;

	clocks = <&l4_mp_clk>, <&peri_emac_ptp_clk>;
	clock-names = "stmmaceth", "ptp_ref";

	resets = <&rst 33>, <&rst 41>;
	reset-names = "stmmaceth", "stmmaceth-ocp";
		
	phy-mode = "rgmii-id";
	max-frame-size = <3800>;
	/* probe for phy addr */
	phy-addr = <0xffffffff>;

	txd0-skew-ps = <420>; /* 0ps */
	txd1-skew-ps = <420>; /* 0ps */
	txd2-skew-ps = <420>; /* 0ps */
	txd3-skew-ps = <420>; /* 0ps */
	rxd0-skew-ps = <420>; /* 0ps */
	rxd1-skew-ps = <420>; /* 0ps */
	rxd2-skew-ps = <420>; /* 0ps */
	rxd3-skew-ps = <420>; /* 0ps */
	txen-skew-ps = <420>; /* 0ps */
	rxdv-skew-ps = <420>; /* 0ps */
	txc-skew-ps = <1440>; /* 540ps */
	rxc-skew-ps = <1680>; /* 780ps */

	status = "okay";
};
0 Kudos
EBERLAZARE_I_Intel
4,389 Views

Hi,

How did you define the clock skews? Is it default? Where did you get them?

Have you run on different versions of Linux? Preferably 5.4

0 Kudos
Silvan
Novice
4,381 Views

Hi Eberlazare,

The skews are calculated based on the trace length and FPGA Pin delays on our custom board and firmware.

I tried it also once with a really early version of kernel 5.4. And at least in the MAC driver I didn't saw any changes since then.

May it is possible to reproduce the issue with an Arria10 development board? Unfortunately, our development board still has an ES2 device which isn't available in Quartus to build the GHRD. When you have an image for this development board I could also try to reproduce the issue on this hardware...?

0 Kudos
EBERLAZARE_I_Intel
4,367 Views

Hi,

Yes, you may want to reproduce using the default settings from the GHRD. 

Can you share the device part number and the Quartus version you are working on?

0 Kudos
Silvan
Novice
4,346 Views

We use the device 10AS057K4F40E3SG and Quartus 19.1.0.

0 Kudos
EBERLAZARE_I_Intel
4,303 Views
0 Kudos
EBERLAZARE_I_Intel
4,227 Views

Helo,

Is there any update from your side?

0 Kudos
Silvan
Novice
4,221 Views

Hi, Thanks for asking.

I ordered the current version of the Arria 10 development board which allows me to directly use the provided image. On this development kit I would try to reproduce the issue and send you a step-by-step explanation to reproduce the issue. This should make it possible that you could reproduce and fix the issue on your side.

The expected receiving time of the development kit is at the end of next week. As soon as I have any news I would give you the information.

Thanks

0 Kudos
EBERLAZARE_I_Intel
4,191 Views

Hi,

My recommendation is to use our GHRD and latest version of Uboot with its default device tree settings, I previously  tested using dev kit and latest version of Uboot and kernel from below, I have not face the high latency you mentioned:

https://rocketboards.org/foswiki/Documentation/BuildingBootloader#Arria_10_SoC

 

0 Kudos
EBERLAZARE_I_Intel
4,077 Views
0 Kudos
Silvan
Novice
4,040 Views

Hi Eberlazare,

It seams, that the issue is still present in GSRD release 2020.11.

The observation is less often than before but it is observable. Right now, I doesn't understand the exact root cause. I try to find a setup on which the observation is reproducible.

Best regards,

Silvan

0 Kudos
EBERLAZARE_I_Intel
3,977 Views

Hi,

Could you share how was the setup/testing to reproduce how the error can be seen using the GHRD on our dev kit if possible.

 

0 Kudos
Silvan
Novice
3,938 Views

Hi Eberlazare,

The Hardware Setup is quite simple. The DevKit is directly connected to a PC. Both have a static network configuration and the data traffic is generated through the tool "iperf3". The PC acts as ‘server’ and the tool is started with the command “iperf3 -s”. The DevKit is the client running the command “iperf3 -c <PC-IP>” In the attached file "Overview.pdf" the setup is shown in more details.

I created a python test script, executed on the DevKit for additional tests. In the attachment the archive “TestSequence.zip” contains the script "TestSequence.py". You find additional test documentation in the script. Right now, I working with this script and try to figure out how to reproduce the issue. In some test runs, I see errors (ErrorCnt variable). A general observation is, that many “Retransmits” are exist which I doesn’t understand.

Thanks for your support, Silvan

0 Kudos
EBERLAZARE_I_Intel
3,875 Views

Hi Silvan,

This is an uncommon issue, I will try to do this using our GHRD by this week.

0 Kudos
EBERLAZARE_I_Intel
3,602 Views

Hi Silvan,

 

I run a couple of tests, it is very rare to see the issue and sometimes none.

 

Also could you elaborate the "(ErrorCnt variable) and the “Retransmits” that you are seeing?

0 Kudos
Silvan
Novice
3,491 Views

Hi Eberlazare,

 


@EBERLAZARE_I_Intel wrote:

Hi Silvan,

 

I run a couple of tests, it is very rare to see the issue and sometimes none.

 

Also could you elaborate the "(ErrorCnt variable) and the “Retransmits” that you are seeing?


 

Do you have a solution for that issue? Or do you still working on it?

0 Kudos
EBERLAZARE_I_Intel
2,622 Views

Hi,

 

Per communication, we will close this topic and you may re-open a new one with reference to this one, once your testing has been complete.

0 Kudos
qwitza
Beginner
1,701 Views

hello,

i think i ve an similar issue?

here we go:

tcp retransmission @ intel forum 

0 Kudos
Reply