We are using a CPU and FPGA (Arria 10) system that communicate via PCIe Gen 2.0 x4 lanes. On the FPGA side there is a ddr3 module. Doing simple write tests we get speeds that max out at 250 MB/sec. Considering our setup we should be getting up to 2000 MB/sec. The DDR3 is not to blame because I get the same speeds with On-Chip memory. I've played around with all sort of settings in the PCIe Hard IP and cannot get the speeds any higher (I can make them lower etc.). I am using the Avalon-MM with DMA interface in the IP. Is there a fundamental concept we are missing or some connection on the IP? Is there something on the CPU side we are not doing? Any suggestions on why we are only at like 10% capacity? Any suggestions or pointers will help tremendously, thank you!
The theoretical throughput for PCIE Gen2 X4 is 2GB/s.
From AN829, the Cyclone 10 PCIE Gen2 X4 achieve 1.66GB/S, the performance numbers are lower than the theoretical numbers due to DMA performance limitation and the way the throughput is measured.
Typical factors affecting Throughput:
General Debug flow to understand link performance:
Symptom-> Host writes data to the FPGA too Slow
Root cause -> Rx buffer for posted TLP in the HIP is too small
Debug -> Use external PCIe analyzer to check if the host needs to wait for the credit from the HIP for each transfer.
Potential Solution -> Change RX buffer allocation in Qsys GUI to high or Max
Regards -SK Lim (Intel)