We developed a custom board containing an Arria10 SoC (10AS048E2F29I1HG). We have connected a SSD via SATA to the pins of the FPGA. Internally, we are using an ATX pll, a transceiver core and a third party SATA core.
Everything works fine until the SoC reaches an internal temperature of 68°C (measured using Intel® FPGA Temperature Sensor IP Core).
At the moment, we try to isolate the root cause. Possibly, it could be the supply voltage, clock or PCB design (besides other issues off course).
- We measured the supply voltage for the transceiver near our DC/DC converter and we did not find something suspicious. Our next step is to measure the supply voltage close to the FPGA pin, since there are also capacitors involved.
- We checked the pll_locked signal using signal tap. No issues found. We also checked the transceiver reference clock with an oscilloscope. Clock looks good.
- For the PCB design, we did a signal integrity simulation to avoid this.
We already figured out that changing VCCT and VCCR has a big influence on the critical temperature ( the temperature when we see that the disparity errors start).
Short summary: 0.95V -> ~47°C, 1.03 V -> ~68° C, 1.1 V -> ~82° C, (we know that 1.1 V is out of spec).
We kindly ask you if you could support us to figure out what is going on. It would be great if you could assist us for the debugging of the interface. We hope that you can help us!
Where is the exact error/issue happening? The ATX PLL is working fine. So, are you getting issue with the XCVR output? Or is there some data corruption happening at the SATA core?
May I know the reason of change in VCCT and VCCR? The values that they are being changed to, correspond to Arria 10 GT devices, not SoC devices. For SoC devices, the range is from 1.0 V to 1.06V. Please use the PDN tool to find out the dynamic requirements for these power rails,
The VCCT and VCCR voltage levels should be equivalent as per the Pin Connection Guideline. Are we making sure this is happening?
thanks for the reply.
The errors are happening on the serial link between transceiver and SSD. We get kernel error messages saying that there are disparity or CRC interfaces errors. There is also a signal within the SATA core that is high, when disparity errors occur. We know this is happening by using signal tap. There is no data corruption within the SATA core. We get these errors once the temperature reaches 68°C (we use 1.03V as VCCT and VCCR).
You are right. We already figured out that 0.95V was a wrong setting, so we use 1.03V for VCCT and VCCR (SATA is a "backplane condition", I think you referred to table 4 in the Arria 10 device sheet, I used version 2020.06.26). We do not use different values for VCCT and VCCR (we never did).
We used the PDN tool during the design phase of our custom board.
We also performed the temperature test on the Arria10 SoC Development Kit using the SATA. The interesting fact is that we see the same errors for SATA once we reach 68°C, hence it seems that the occurrence of the errors has nothing to do with the actual hardware. It might be a bitstream/setup issue.
It might be useful that we share more insights about our setup. So, we would like to request for this communication to be handled privately.
Not sure is this issue was resolved at your end. Did you check the timing report for the highest temperature?
This thread will be transitioned to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you