in the past we already discussed an "PCIe bandwidth drop" issue on Haswell/Broadwell within this forum (see https://software.intel.com/en-us/comment/1910697#comment-1910697 ).
We now have to run our video capture PCIe (Gen2x8) boards on Skylake-SP (Dual Xeon). While testing we noticed an similar issue on Skylake-SP, even if not that worse compared to Haswell/Broadwell:
The video streams (PCIe slot -> RootComplex) start stuttering every few seconds or minutes. When this happens all Tx Posted Data credits have expired. We observed this situation (all PD credits consumed) already in the older IvyBridge architecture, however, the system recovered quickly from this situation and the temporary bandwidth drop was easily compensated by the FIFOs in the Tx signal path. With Skylake-SP we observe Tx channel 'blackouts' which persist sometimes much longer. In this case the PD credit counter remains at zero (or close to zero, e.g. at 4) for tens of microseconds before the system recovers. After this 'blackout period' the system recovers quickly, i.e. PD credits are being freed up in larger chunks (e.g. 0 -> 96 or 0 -> 120). However, this extended period with no Tx traffic can no longer be compensated by the size of our FIFOs, i.e. the FIFOs overflow and the resulting data stream gets corrupted.
On Haswell/Broadwell the 'blackouts' lasted longer and the PD credits were often being returned quite slowly - even at times when no new Tx packets had being issued. Anyway for Haswell/Broadwell "allocating more LLC cache to IIO" improves the issue for our application dramatically (adjusting "LLC for IIO" can be done by a certain MSR register: IIO_LLC_WAYS (0xC8B) ).
We know, that the new Skylake architecture has been changed a lot compared to Haswell/Broadwell, but maybe the fact, that IIO_LLC_WAYS improves our issue for Haswell/Broadwell, gives us a hint what we could improve/adjust (Skylake MSR registers?) to make our application running on the new Skylake-SP platform!?
Any hints or ideas?
Thanks and kind regards
Are you using pcie analyzer to observe the credits pattern? Or, is there any register to pull out the realtime credits?
We are investigating pcie performance issue on Purley/Skylake platform, though the opposite direction from your application. We are doing posted write from CPU side toward its peer through NTB link, and observed 8GB/s bottleneck over Gen3x16 config. This might be related to flow control, so we need to observe credits as you did.
Hello Friedhelm S.,
Per your posted suggestion in adjusting "IIO_LLC_WAYS" I solved a similar issue on XeonD 1500 CPU.
Now I face the same problem as in Skeylake CPU(i9-7980XE, 18-core) CPU. I would appreciate if you let me know if you managed to resolve this issue on your sky-lake issue?
I also found this link to be useful: https://blog.exxactcorp.com/skylake-sp-iio-module/
Thank you & Regards,
Dynamic C4 Pte Ltd.