Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20640 Discussions

Cyclone10GX Temperature Issue

YYang79
Beginner
1,921 Views

Hi,

I use a cyclone 10 gx fpga (10CX150YF780C5G) on my pcie card.  One transceiver is used as a 10Gbps fiber link and other 4 transceivers are used as Gen1 Pcie X4 lanes. No external memory interface. The total power consumption is about 2W.

The problem is when fpga's die temperature rises above about 55C, the transceiver that is used to connect the fiber link(SFP) has occurred missing word error in transmitting side. I use the transceiver  loopback function to monitor the data both at a remote receiver output and at the local loopback receiver output. They show exactly same fault. It tells us the PCS stage in the transmitter side has something wrong. The reference clock to the transceiver ATX-pll is 644.53125MHz. Other functions in the fpga is working well.

When I lower the temperature by add a fan to blow it, it works normally. Is there any one knowing how to fix it?

Thanks a lot!

 

0 Kudos
25 Replies
SengKok_L_Intel
Moderator
1,523 Views

Hi,


In order to better understand the problem, do you mean if you enable the internal serial loopback (no using cable), the same problem is observed if the temperature rises above 55C, while the PCIe and other functions are still working well?


How many devices have a similar problem? Could you please try to test on another channel to determine if there is any channel dependency on this particular device?


Regards -SK


0 Kudos
YYang79
Beginner
1,501 Views

Hi,

You are right. Using the internal loopback(from transmitter's PCS output loopback to input of receiver's PCS) at the xgmii interface of the transceiver we get wrong data(mainly losing a 64-bit word in a packet) when operation temperature rises to 55C. Since when you loopback, the data actually is sent to the remote receiver over the fiber link too, so we can see at the remote receiver output that we get the same loss of data as the loopback output.

For the design, we get clean timing analyzer result without any timing violations. 

We have total five boards. They fail at different temperature, the lowest is 55C and the highest is 72C not reaching to the specified 100C. In board design we only use one transceiver as a 10Gbps fiber link. It may not easy to switch to other transceiver in the FPGA to test other channels.

Thanks. Looking forward to your further helps.

0 Kudos
SengKok_L_Intel
Moderator
1,495 Views

Hi,


If enable the serial loopback can see the problem, you should be able to test on other channels as well since it does not use the PHY channel (fiber cable), you can just change the Pin assignment will do.


Do you have a signal tap that can show the pass and fail condition?


Regards -SK


0 Kudos
YYang79
Beginner
1,492 Views

Yes I have.

See attached. It shows the data communication at xgmii interface.

data_tx_rx.png

0 Kudos
SengKok_L_Intel
Moderator
1,481 Views

Hi,


Can you provide me a simplified design that only consists of 1 channel of 10G that can replicate the issue on your hardware, so that I can have a better understanding of what is the setting in the transceiver and 10G MAC IP?


Besides, please ensure your board design has met the Pin Assignment Guideline, especially the power supply, and transceiver pins.

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/dp/cyclone-10/pcg-01022.pdf


0 Kudos
YYang79
Beginner
1,473 Views

Hi,

The board works well under the temperature of 55C, so i think the pcb design should be ok and pins' assignment does not look having any problem so far. 

For your further review, I can archive my fpga design and email it to you instead of post here. Please give me an email address. Does it sound right?

Thanks a lot for your helps.

0 Kudos
SengKok_L_Intel
Moderator
1,466 Views

Please refer to the following link (table 1), and check the GXB power supply (e.g. Vcct_GXB, Vccr_GXB, and Vcch_GXB), and determine if there is any difference between a pass and fail case. The FPGA is supposed to work at above 55C, and since you encounter it on multiple boards, so it is suspicious if there is a board issue.


https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/cyclone-10/c10gx-51002.pdf


Besides, you can send me a private message with the simple design, so that I can have a sanity check on it. Thanks.


0 Kudos
YYang79
Beginner
1,464 Views

How to send you a private message? 

0 Kudos
YYang79
Beginner
1,421 Views

I have checked the GXB powers. They are all within the range on table1, and unchanged with the temperature variation.

0 Kudos
SengKok_L_Intel
Moderator
1,430 Views

You can click on the Message icon (top right), and then click compose to create new message, from the “send to”, you can find my name over there.

 

Regards -SK


0 Kudos
SengKok_L_Intel
Moderator
1,414 Views

I found that there is a setup timing violation in your design. I would suggest cleaning the timing first, and determine if the problem still persists.


0 Kudos
YYang79
Beginner
1,411 Views

No improvement with clean timing result.

0 Kudos
SengKok_L_Intel
Moderator
1,404 Views

Hi,

 

Since this problem can replicate with internal loopback, can you please change the pin assignment as below and determine if there is channel dependency?

 

set_location_assignment PIN_AF25 -to "SFP_RXD0(n)"

set_location_assignment PIN_AG27 -to "SFP_TXD0(n)"

set_location_assignment PIN_AF26 -to SFP_RXD0

set_location_assignment PIN_AG28 -to SFP_TXD0

 

Besides, please refer to the attached screenshot, add the interface signals of native PHY to signal tap, and then compare the "tx_parallel_data" and "rx_parallel_data" to determine if there is a mismatch. The data drop may happen before this module.

 

 

 

 

0 Kudos
YYang79
Beginner
1,397 Views

Hi,

We always test the data at these points. We usually call them xgmii interface. Only difference is a logical conversion of words from big endian to little endian between Native PYH parallel and xgmii. The data at xgmii shows OK.

I will add "tx_parallel_data" and "rx_parallel_data" into my signaltap to see if there is any difference.

0 Kudos
SengKok_L_Intel
Moderator
1,362 Views

Please do let me know if more help is needed here. Thanks.


0 Kudos
YYang79
Beginner
1,359 Views

Yes. We are still struggling with this thermal issue. We have checked all VCC power supplies to the Cyclone 10. They are all within the specific ranges and there is no big change with the temperature increasing (less than 60mV). We have used Toolkit to test the PMA layer of the transceiver. The result is that there is no bit error under 65C (our problem occurs usually at 64C and below) . 

We don't have any obvious clue to fix it now. Any further suggestions are definitely welcome!

0 Kudos
SengKok_L_Intel
Moderator
1,353 Views

For the Bit Error Rate (BER), you probably can play around with the PMA setting. e.g increase the VOD of transmitter


0 Kudos
YYang79
Beginner
1,350 Views

Since there is no bit error by using Toolkit, I don't think we need to adjust the PMA settings. Am I right?

I am currently working on Intel's "Low Latency Ethernet 10G MAC Intel® Cyclone® 10 GX FPGA IP
Design Example" trying to implant it into our board to see what it happens.

Do you have any suggestion?

Anyway, thanks a lot.

0 Kudos
SengKok_L_Intel
Moderator
1,325 Views

Yes, it sounds good to use the LL MAC 10G MAC IP example design. If the PMA value is not optimal, you may see the high bit error rate when you vary the temperatures.


0 Kudos
YYang79
Beginner
1,191 Views

In  LL MAC 10G MAC IP example design rev19.1 an IOPLL is used to generate 156.MHz and 312.5MHz, while in my design I use an fpll instead following the transceiver design guide line. That is the only difference. The example design works well on my board. So I changed my design to use IOPLL . The result is that it can work at high temperature now without error! It is 73C now but it used to fail at 57C.

It looks it is a big improvement at least. 

My question is why  is like that? 

0 Kudos
Reply