I used the Arria V GX FPGA Starter Kit connected to a computer via PCIE. In the Kit, I implemented my DMA Read/Write using the pipeline transfer. The DMA read the data from the PC's memory then write to another region of the PC's memory via the PCIE.The ip I used is Avalon-MM Arria V Hard IP for PCI Express with the configuration: Gen1 x8, 32-bit Avalon-MM address width. The software on the PC is Visual Studio programming by C++ and using the 12.0.0 Jungo Windriver. The project works fine but the transferring speed, especially the reading speed, is too slow. I had done a lot of projects with this DMA, so I don't think the problem is because of my DMA. I have checked the SignalTap of the project, and find out that: + (Figure 1) There are always over 100 clocks since the DMA began to read (the first time 'read' signal is asserted) to the first returning data (the first time 'readdatavalid' signal is asserted) + (Figure 2) After that, there are always about 20 to 50 standby clocks between two returning data, which is too slow. My design needs to read the data from PC: (1) very little data (about 5 to 10 data for each time); (2) random access (that's why I didn't use burst transfer). But every time a new transferring session started, over 100 clocks are wasted at the beginning and I don't know why. To conclude, Avalon memory-mapped read pipeline costs about 200 clocks just to read 5 data from the PC's memory via PCIE. My questions are: (1) Why there are so many clocks being wasted in the read pipeline transfer via PCIE? (2) Is there anything else I can do to speed up the transfer rate?