Re: Re:Cisco- Altera PHY issue

Himabindu · ‎11-13-2020

and we are using your PHY in one of our IM 8x10G ( IMA8Z) .

SDK version : 4.0

CHIP id: TpDrv-tpo225_c1-0_13.tar.gz

ISSUE:

There is a customer issue on 8x10G IM that after system reload interface link is down.

From Optics point of view, receiving good signal but still link is down.

TOPO:

ASR903 ( IMA8Z : tengigport) ---------------------------------- Test device ( similar to ixia or spirent test generator)

In ASR903, 2 IMA8Zs (slot-0 and slot-4) are connected to the same test device and after reload both the interfaces are down.

After analyzing the logs got to know the line side eth port alarm ‘rxFifoUnderflow is set.

AAC-4206-1#test platform hardware pp active function uea_test_iomd_cmd : 0 : "phy cli eth port list 8$rm hardware pp active function uea_test_iomd_cmd : 0 : "phy cli eth port list 8"

Output:: PortNo: 8

Output:: ChipNo = 32

Output:: Type = 2

Output:: RegIndex = 0

Output:: RegType = 3

Output:: Connection = 0

Output:: mappedTunnelId = 0xffffffff

Output:: Options= 0x0 (None)

Output:: Interrupt mask= 0x0 (None)

Output:: Options: 0x0

Output:: Alarm status:

Output:: PCS:

Output:: rxFault = 0

Output:: hiBer = 0

Output:: notBlockLock = 0

Output:: rsInserted = 0

Output:: rxFifoOverflow = 0

Output:: rxFifoUnderflow = 1

Output:: Counters:

Output:: PCS:

Output:: nonIdleBlocks = 16238088037415

Output:: invBlocks = 0

System side there were no alarms.

From code pointers, whenever tx or rx alarms are seen (tpo225_ethPortAlarmGetRxStatus)then link has been made down (tpo_225_get_link_status)that’s the RC for link down.

Can you please share your inputs or details on the above alarm case on what scenario ‘rxFifoUndeflow’ can happen and what’s the reason for it?

Also, there is a work around added recently that whenever system side alarms are seen like rxfault and nonBlockLock then serdes soft reset has been called (tpo_225_serdes_reset).

For line side alarms also do we need to apply the same work around or is there a way or reason for the alarm in customer case?

Thanks,

Hima.

SengKok_L_Intel · ‎11-17-2020

The hyperlink in this forum that you pointed out is not accessible. In order to better understand this problem, the following info is helpful.

What FPGA in this product? Arria 10 or Stratix 10?
What Quartus version that you are using?
Which IP was implemented in this product? Is this Low Latency 10G IP + 10G base KR?
Out of how many boards having this problem?

If the rxFifoUnderflow indicates the FPGA PHY RX FIFO, which means the data might be corrupted or there is an error in transmission, it could be due to the noise/jitter or the RX clock recovery fail to lock. And yes, a reset could be helpful when the data was corrupted or an error happened.

Regards -SK

Himabindu · ‎11-19-2020

hi,

Please find my answers to your questions:

Q: What FPGA in this product? Arria 10 or Stratix 10?

A==> Stratix-V

Q: What Quartus version that you are using?

A==> from our code we are using the SDK version 4.0

tpDrv-tpo225_c1-0_13.tar.gz

TpDrv-tpo425p_c1-0_11.tar.gz

Q: Which IP was implemented in this product? Is this Low Latency 10G IP + 10G base KR?

A===> its 10G base KR

Q:Out of how many boards having this problem?

A==> Currently seen in 1 as per customer report

If the rxFifoUnderflow indicates the FPGA PHY RX FIFO, which means the data might be corrupted or there is an error in transmission, it could be due to the noise/jitter or the RX clock recovery fail to lock. And yes, a reset could be helpful when the data was corrupted or an error happened.

===> Do we need to debug when the issue occurs next time or without any further debug is it ok to implement the work around of RESET when such alarms are seen.

In current scenario we have seen 'rxFifoUnderflow' but what about other scenarios and which all alarms do we need to apply this work around.

yes, we have requested customer to make sure transmission device (here traffic generator) should be proper or try with another device than TG but as they recovered now not tried. Its been observed twice we want to make sure not to hit the issue due to our code and hence, requesting is there any solution or work around is MUST for it?

SengKok_L_Intel · ‎11-19-2020

Hi,

If the traffic generator is working as expected, the PHY FIFO should not overflow or underflow. It is better to understand if there is any other dependency on this particular board. It could be due to the Signal Integrity if this issue happens intermittently. This is good to know if this board passed any manufacturing or validation testing before delivery to the customer, and any changes cause the problem.

Regards -SK

SengKok_L_Intel · ‎11-29-2020

If further support is needed in this thread, please post a response within 15 days. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.