I have been working on the chaining DMA example project for PCIe provided by Altera.I am a bit confused about what performance I should expect though. When sending 16.384 GBytes of data from a C code (or reading the same amount of data), the program runs for about 17 seconds, which gives a bit rate of 7.71Gbps. I am using Gen2 64-bit x4 lanes. Gen2 is quoted at 5Gbps per lane, but because of the 8/10 encoding it's actually about 4Gbps. Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? This is twice what I have... I am on linux and I have based the driver on the altpciechdma driver by Woestenberg and Heppermann. At some point it says that it's using 32-bit DMA addressing instead of 64, could this be the reason I am two times slower than I should? When looking at performance results from an456.pdf there seems to be no difference betweem Gen2 x4 64-bit and Gen2 x4 128-bit (I'm not sure what those bits are though, is it the same as the DMA mask?). Finally, when they say 5.0Gbps, is it one way or two-way? i.e., should sending 100MB and receiving 100MB simultaneously take as long as sending OR receiving 200MB? Thank you!
--- Quote Start --- Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? This is twice what I have... --- Quote End --- PCIe DMA performance is restricted by both, PCIe and PC memory throughput. You seem to assume, that PC memory speed won't play a role, very unrealistic in my opinion.
What do you mean by PC memory speed?I looked at the amount of RAM used during the DMA's, there's plenty of free memory, so it doesn't seem like a bottleneck. I'm sending the data to the driver from a C code using fwrite and fread functions (to read/write from/to the driver's file), I don't think they would slow the system down. I'm not sure where the problem could be. Also, I'm still not sure whether the 5.0Gbps is one or two-way. If it's two-way, then the problem might just be that I'm doing a DMA write followed by a read, followed by a write etc..., instead of launching both a DMA read and a DMA write at the same time (if that's possible).
I can't speak exactly to your scenario, but my situation is as follows. System setup:Motherboard: Asus AT5IONT-I OS: Windows 7 FPGA: EP4CGX15BF14C7 (gen 1.0 x1 lane) Design: PCI Express to External Memory Reference Design My transfer speeds were (16kB transfer sizes): Theoretical limit: 250MB/s Actual (FPGA->computer): 198MB/s Actual (computer->FPGA): 120MB/s I hope that provides a more concrete basis for comparison.
Thank you for the responses!I understand the whole thing a bit better now. By tweaking some parameters, I actually get a speed close to what you reported (multiplied by 8, since I am using 4 lanes and Gen2). There still is one problem though: I can get this speed when doing either a DMA read, or a DMA write. But I don't understand how to do both at the same time. I am using the chaining DMA example, and thus need to fill the descriptor table. I fill up the descriptors (endpoint address, root complex address, length of the data), then write the number of descriptors into the write header to launch a write, or the read header to launch a read. From what I understand, both the DMA write and the DMA read modules share the same descriptor table, so how can I trigger them at the same time? how do they know which descriptor is for which module? or is a local copy of the descriptor table generated after launching one of them, so that I can overwrite previous descriptors even before the operation is over?
OK I think I finally got it.As the documentation clearly says, there are two descriptor tables, not one. I got confused because the driver I was basing my work on was creating only one descriptor table in RC memory, and using it for both DMA reads and DMA writes, I didn't realize I could just instantiate a second one.