Approximately 10% of our boards (6/63) have issues with the Ethernet initialization (i211 chip#WGI211ATSLJXZ).
90% of our PCB's seem to have no issues. The remaining 10% have an individual failure rate for cold resets (room temperature failure rates: 55% 33% 32% 31% 9% 7%).
If the initialization goes well, Linux can be re-booted with Ethernet communication always working. If the i211 initialization however fails, upon the next reboot the Linux boot hangs. This has been the case for three different Linux Kernel versions.
We have tested several different Linux Kernel versions, and with "hundreds" of software modifications without any help. We have not been able to fix this in software. When we see a failure the chip "seems" completely dead.
See attached document for more details.
Do you have any suggestions on what the root-cause for this failure is?
Could you help us to debug the issue with JTAG?
Have you experienced any similar issues before (and found the solution)?
Thank you in advance!
- Yes. We have designed a board with two Ethernet ports. For one of them we use your i211 chip. It is connected over PCIe to our Frescale IMX6Q CPU. I have compared our design to your schematic and layout design guidelines. I haven't found any noticable differencies, and most of the things I found I have already tested without any luck.
- I need help to find/solve the issue. As I wrote, more than 90% of our boards have no issues whatsoever, but the remaining 10% have i211 initialization issues. Boards with issues fails 7-55% of the time during power-cycles in room-temperature. When we swapped the i211 chip between a "known-good" PCB and an "issue-PCB" the failure followed the i211 chip. We have tried many software and hardware changes, but none have helped so far.
It has now gone 12 days.
We have now made two swaps of i211 between known-good and issue boards. For both swaps the issue followed the i211 chip - The good boards turned bad and the bad boards turned good.
We still havn't found the actual root-cause for this issue.
On one board, that had a failure rate of 31%, we were able to get this down to 8.7% by replacing the xtal. The replacement xtal had a load of 16pF. When we used 10pF cap loads (!) instead of 27pF for this new xtal we reduced the issue all the way down to 1%.
I can only think of this beeing because of a higher voltage swing on the input?
- Does the input swing values for external oscillators also apply to xtals (200mV max low and 1400mV min high)?
- If yes on 1. -> Is this value for before 50 ms or after? After 50 ms the swing is greatly reduced.
On one issue board we measured the following with a 3.9pF probe:
10ms – 50ms 10 mV - 1600mV
50ms - 320mV - 1080mV
Scope pic (Undersampled, just for voltage levels)
I used a spectrum analyzer with SPF=2KHz and RBW=30Hz to find the frequency on another issue board:
Xtal: 25 000 650 Hz
PCIe clk: 99 998 900 Hz
I'm now replacing the xtal once again, trying one with 10pF Cload to get a higher swing at the input.
In total we have so far tried 3 different xtals, with 20pF, 18pF and 16pF Cloads. All within spec (you recommend 16pF)
The issue seems to have gotten lower with the lower Cloads, but we haven't found any combination removing the issue altogether.
I have been through the design guidelines, both the schematic and layout. I have tried so to speak all the small differencies found in the schematic guidelines without any luck.
This thread can be locked since we finally found the solution.
The PCIe reference clock was out of spec. Adding 470 Ohm pull-up and 56 Ohm pull-down resistors to CLKp & CLKn solved the issue.
To anyone making a design based on IMX6; add PU and PD resistors on the reference clock in addition to the 0.1uF capacitors no matter what the reference design is doing. It's needed to get the correct common-mode voltages and to make it work for "any" perpherial using PCIe.