I'm using the I210 with 3.14.26-rt22 kernel and igb driver 5.2.15.
During a stress test, after ~10-15 hours I receive NETDEV WATCHDOG timeout.
It seems that the RX FIFO had overrun scenario, but it can't recover from it, no further receive/transmit interrupt is asserted after the overrun scenario, it seems that RX/TX unit hangs.
Bringing down/up the network interface, doesn't recover from this problematic scenario, only power-up to the board (TI AM5728).
Attached is information retrieved after the timeout.
Thank you for the post. Is this an onboard I210 NIC or a standalone I210-T1 adapter? Just to double check if you have check with the board vendor if this is an onboard NIC.
After this timeout, did you try reload the driver?
Thanks for your reply.
The hang problem is understood.
I saw that after the overrun scenario of the RX FIFO, the endpoint had pending MSI-X interrupt, but the RC didn't report on any pending interrupt, therefore GIC/CPU wasn't notified.
After manually acknowledging the MSI-X pending interrupt, interrupts returned to work correctly.
This problematic hang behavior was caused due to the fact that PCIe RC doesn't support MSI-X only MSI.
The NIC was configured to work with MSI-X by default.
After configuring the NIC to work with MSI mode, after overrun scenario on the RX FIFO, the hang problem doesn't occur anymore .