Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Valued Contributor III
1,036 Views

Question about PCIe speed

I have been working on the chaining DMA example project for PCIe provided by Altera. 

 

I am a bit confused about what performance I should expect though. 

 

When sending 16.384 GBytes of data from a C code (or reading the same amount of data), the program runs for about 17 seconds, which gives a bit rate of 7.71Gbps. 

 

I am using Gen2 64-bit x4 lanes. 

Gen2 is quoted at 5Gbps per lane, but because of the 8/10 encoding it's actually about 4Gbps. Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? 

This is twice what I have... 

 

I am on linux and I have based the driver on the altpciechdma driver by Woestenberg and Heppermann. At some point it says that it's using 32-bit DMA addressing instead of 64, could this be the reason I am two times slower than I should? 

When looking at performance results from an456.pdf there seems to be no difference betweem Gen2 x4 64-bit and Gen2 x4 128-bit (I'm not sure what those bits are though, is it the same as the DMA mask?). 

 

Finally, when they say 5.0Gbps, is it one way or two-way? i.e., should sending 100MB and receiving 100MB simultaneously take as long as sending OR receiving 200MB?  

 

Thank you!
0 Kudos
6 Replies
Highlighted
Valued Contributor III
2 Views

 

--- Quote Start ---  

Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? 

This is twice what I have... 

--- Quote End ---  

 

PCIe DMA performance is restricted by both, PCIe and PC memory throughput. You seem to assume, that PC memory speed won't play a role, very unrealistic in my opinion.
0 Kudos
Highlighted
Valued Contributor III
2 Views

What do you mean by PC memory speed? 

I looked at the amount of RAM used during the DMA's, there's plenty of free memory, so it doesn't seem like a bottleneck. 

I'm sending the data to the driver from a C code using fwrite and fread functions (to read/write from/to the driver's file), I don't think they would slow the system down. 

 

I'm not sure where the problem could be. 

 

Also, I'm still not sure whether the 5.0Gbps is one or two-way. If it's two-way, then the problem might just be that I'm doing a DMA write followed by a read, followed by a write etc..., instead of launching both a DMA read and a DMA write at the same time (if that's possible).
0 Kudos
Highlighted
Valued Contributor III
2 Views

5.0Gbps is two-way speed,do you perform the read and write between mcu and fpga?

0 Kudos
Highlighted
Valued Contributor III
2 Views

I can't speak exactly to your scenario, but my situation is as follows. System setup: 

Motherboard: Asus AT5IONT-I 

OS: Windows 7 

FPGA: EP4CGX15BF14C7 (gen 1.0 x1 lane) 

Design: PCI Express to External Memory Reference Design 

 

My transfer speeds were (16kB transfer sizes): 

Theoretical limit: 250MB/s 

Actual (FPGA->computer): 198MB/s 

Actual (computer->FPGA): 120MB/s 

 

I hope that provides a more concrete basis for comparison.
0 Kudos
Highlighted
Valued Contributor III
2 Views

Thank you for the responses! 

 

I understand the whole thing a bit better now. 

 

By tweaking some parameters, I actually get a speed close to what you reported (multiplied by 8, since I am using 4 lanes and Gen2). 

 

There still is one problem though: 

I can get this speed when doing either a DMA read, or a DMA write. But I don't understand how to do both at the same time. 

 

I am using the chaining DMA example, and thus need to fill the descriptor table. 

I fill up the descriptors (endpoint address, root complex address, length of the data), then write the number of descriptors into the write header to launch a write, or the read header to launch a read. 

 

From what I understand, both the DMA write and the DMA read modules share the same descriptor table, so how can I trigger them at the same time? how do they know which descriptor is for which module? or is a local copy of the descriptor table generated after launching one of them, so that I can overwrite previous descriptors even before the operation is over?
0 Kudos
Highlighted
Valued Contributor III
2 Views

OK I think I finally got it. 

 

As the documentation clearly says, there are two descriptor tables, not one. I got confused because the driver I was basing my work on was creating only one descriptor table in RC memory, and using it for both DMA reads and DMA writes, I didn't realize I could just instantiate a second one.
0 Kudos