- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are using a CPU and FPGA (Arria 10) system that communicate via PCIe Gen 2.0 x4 lanes. On the FPGA side there is a ddr3 module. Doing simple write tests we get speeds that max out at 250 MB/sec. Considering our setup we should be getting up to 2000 MB/sec. The DDR3 is not to blame because I get the same speeds with On-Chip memory. I've played around with all sort of settings in the PCIe Hard IP and cannot get the speeds any higher (I can make them lower etc.). I am using the Avalon-MM with DMA interface in the IP. Is there a fundamental concept we are missing or some connection on the IP? Is there something on the CPU side we are not doing? Any suggestions on why we are only at like 10% capacity? Any suggestions or pointers will help tremendously, thank you!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The theoretical throughput for PCIE Gen2 X4 is 2GB/s.
From AN829, the Cyclone 10 PCIE Gen2 X4 achieve 1.66GB/S, the performance numbers are lower than the theoretical numbers due to DMA performance limitation and the way the throughput is measured.
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an829.pdf
Typical factors affecting Throughput:
- Application logic - does not write data fast enough to the HIP or can't sink data fast enough from the HIP
- PCIe link stability - The link has high BER which causes it to go to Recovery frequently, reducing the bandwidth of the link
- Host - does not return credit back to the FPGA fast enough or has a long latency to return back to the FPGA
General Debug flow to understand link performance:
- Determine the direction of data - Data moves from host to the FPGA or vice versa
- Determine the initiator of the transfer - Host or the FPGA initiates the transfer
- Consider how the performance is measured - measured by hardware or software
For example:
Symptom-> Host writes data to the FPGA too Slow
Root cause -> Rx buffer for posted TLP in the HIP is too small
Debug -> Use external PCIe analyzer to check if the host needs to wait for the credit from the HIP for each transfer.
Potential Solution -> Change RX buffer allocation in Qsys GUI to high or Max
Regards -SK Lim (Intel)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page