We are using a CPU and FPGA (Arria 10) system that communicate via PCIe Gen 2.0 x4 lanes. On the FPGA side there is a ddr3 module. Doing simple write tests we get speeds that max out at 250 MB/sec. Considering our setup we should be getting up to 2000 MB/sec. The DDR3 is not to blame because I get the same speeds with On-Chip memory. I've played around with all sort of settings in the PCIe Hard IP and cannot get the speeds any higher (I can make them lower etc.). I am using the Avalon-MM with DMA interface in the IP. Is there a fundamental concept we are missing or some connection on the IP? Is there something on the CPU side we are not doing? Any suggestions on why we are only at like 10% capacity? Any suggestions or pointers will help tremendously, thank you!
The theoretical throughput for PCIE Gen2 X4 is 2GB/s.
From AN829, the Cyclone 10 PCIE Gen2 X4 achieve 1.66GB/S, the performance numbers are lower than the theoretical numbers due to DMA performance limitation and the way the throughput is measured.
Typical factors affecting Throughput:
- Application logic - does not write data fast enough to the HIP or can't sink data fast enough from the HIP
- PCIe link stability - The link has high BER which causes it to go to Recovery frequently, reducing the bandwidth of the link
- Host - does not return credit back to the FPGA fast enough or has a long latency to return back to the FPGA
General Debug flow to understand link performance:
- Determine the direction of data - Data moves from host to the FPGA or vice versa
- Determine the initiator of the transfer - Host or the FPGA initiates the transfer
- Consider how the performance is measured - measured by hardware or software
Symptom-> Host writes data to the FPGA too Slow
Root cause -> Rx buffer for posted TLP in the HIP is too small
Debug -> Use external PCIe analyzer to check if the host needs to wait for the credit from the HIP for each transfer.
Potential Solution -> Change RX buffer allocation in Qsys GUI to high or Max
Regards -SK Lim (Intel)