we have a board using a pair of i210s. On the first run, all of our chips worked fine. On our second run, on the first five boards tested, we have had two failures.
The failures are roughly the same. The i210s on the first PCIe bus work, but they identify their eeproms as a "Protected Flash" with an ID of 0xFAFA. I have placed a CRO with an SPI decoder on the lines, and I can see the chips identifying themselves as an ID 0x2080, so I'm not sure what's going on here. It is still able to read out values, so this is weird, but not a killer.
The i210s on the second PCIe bus cannot find their internal PHY. Using lanconf32 I can see that the "Raw Copper/MDIO PHY Registers" are all zero and "No PHY Found", as opposed to the other i210s which report valid data and an "Intel-M" 0x005043 PHY.
To be clear here, three boards are fine, and two boards have issues with both of their i210s. The same fault occurs for chips on the same PCIe bus.
I have had a look at "General Registers (Raw)" which look pretty good:
Device Ctrl Reg 0x00000 081C0241
Ext Device Control 0x00018 001400C0
Device Status Reg 0x00008 00280780
Compared with a working system, the Device Status Reg differs in bit 10, which indicates PHY has been reset. This bit should be cleared by the driver. Ext Device Control register differs in bit 29 which should be set to 1 to indicate the driver is loaded. Most importantly, Ext Device Control bits 22 and 23 are zero, meaning the internal PHY should be in use.
When I try and load my driver (Linux 4.4.10) the igb driver fails to load the second i210. Following the driver's probe() function with a large number of prink()s, it all comes down to not having a valid PHY.
I have compared the physical chips, and the markings on them are all the same, and all as I would expect.
I have spent a couple of days trying to track this down, and I'm at a loss of what I can try now. Any ideas are welcome.