Community
cancel
Showing results for 
Search instead for 
Did you mean: 
vtoka
Beginner
795 Views

How come we cannot achieve PCIe write speeds above 250MB/sec on PCIe Gen 2 x4 interface using the Arria 10 PCIe Hard IP? We should be at 2000 MB/sec.

We are using a CPU and FPGA (Arria 10) system that communicate via PCIe Gen 2.0 x4 lanes. On the FPGA side there is a ddr3 module. Doing simple write tests we get speeds that max out at 250 MB/sec. Considering our setup we should be getting up to 2000 MB/sec. The DDR3 is not to blame because I get the same speeds with On-Chip memory. I've played around with all sort of settings in the PCIe Hard IP and cannot get the speeds any higher (I can make them lower etc.). I am using the Avalon-MM with DMA interface in the IP. Is there a fundamental concept we are missing or some connection on the IP? Is there something on the CPU side we are not doing? Any suggestions on why we are only at like 10% capacity? Any suggestions or pointers will help tremendously, thank you!

0 Kudos
1 Reply
SengKok_L_Intel
Moderator
36 Views

Hi,

 

The theoretical throughput for PCIE Gen2 X4 is 2GB/s.

 

From AN829, the Cyclone 10 PCIE Gen2 X4 achieve 1.66GB/S, the performance numbers are lower than the theoretical numbers due to DMA performance limitation and the way the throughput is measured.  

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an829.pdf

 

Typical factors affecting Throughput:

  1. Application logic - does not write data fast enough to the HIP or can't sink data fast enough from the HIP
  2. PCIe link stability - The link has high BER which causes it to go to Recovery frequently, reducing the bandwidth of the link
  3. Host - does not return credit back to the FPGA fast enough or has a long latency to return back to the FPGA

 

General Debug flow to understand link performance:

  1. Determine the direction of data - Data moves from host to the FPGA or vice versa
  2. Determine the initiator of the transfer - Host or the FPGA initiates the transfer
  3. Consider how the performance is measured - measured by hardware or software

 

For example:

Symptom-> Host writes data to the FPGA too Slow

Root cause -> Rx buffer for posted TLP in the HIP is too small

Debug -> Use external PCIe analyzer to check if the host needs to wait for the credit from the HIP for each transfer.

Potential Solution -> Change RX buffer allocation in Qsys GUI to high or Max

 

 

 

Regards -SK Lim (Intel)

 

 

Reply