- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been working on the chaining DMA example project for PCIe provided by Altera.
I am a bit confused about what performance I should expect though. When sending 16.384 GBytes of data from a C code (or reading the same amount of data), the program runs for about 17 seconds, which gives a bit rate of 7.71Gbps. I am using Gen2 64-bit x4 lanes. Gen2 is quoted at 5Gbps per lane, but because of the 8/10 encoding it's actually about 4Gbps. Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? This is twice what I have... I am on linux and I have based the driver on the altpciechdma driver by Woestenberg and Heppermann. At some point it says that it's using 32-bit DMA addressing instead of 64, could this be the reason I am two times slower than I should? When looking at performance results from an456.pdf there seems to be no difference betweem Gen2 x4 64-bit and Gen2 x4 128-bit (I'm not sure what those bits are though, is it the same as the DMA mask?). Finally, when they say 5.0Gbps, is it one way or two-way? i.e., should sending 100MB and receiving 100MB simultaneously take as long as sending OR receiving 200MB? Thank you!Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct? This is twice what I have... --- Quote End --- PCIe DMA performance is restricted by both, PCIe and PC memory throughput. You seem to assume, that PC memory speed won't play a role, very unrealistic in my opinion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean by PC memory speed?
I looked at the amount of RAM used during the DMA's, there's plenty of free memory, so it doesn't seem like a bottleneck. I'm sending the data to the driver from a C code using fwrite and fread functions (to read/write from/to the driver's file), I don't think they would slow the system down. I'm not sure where the problem could be. Also, I'm still not sure whether the 5.0Gbps is one or two-way. If it's two-way, then the problem might just be that I'm doing a DMA write followed by a read, followed by a write etc..., instead of launching both a DMA read and a DMA write at the same time (if that's possible).- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
5.0Gbps is two-way speed,do you perform the read and write between mcu and fpga?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't speak exactly to your scenario, but my situation is as follows. System setup:
Motherboard: Asus AT5IONT-I OS: Windows 7 FPGA: EP4CGX15BF14C7 (gen 1.0 x1 lane) Design: PCI Express to External Memory Reference Design My transfer speeds were (16kB transfer sizes): Theoretical limit: 250MB/s Actual (FPGA->computer): 198MB/s Actual (computer->FPGA): 120MB/s I hope that provides a more concrete basis for comparison.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the responses!
I understand the whole thing a bit better now. By tweaking some parameters, I actually get a speed close to what you reported (multiplied by 8, since I am using 4 lanes and Gen2). There still is one problem though: I can get this speed when doing either a DMA read, or a DMA write. But I don't understand how to do both at the same time. I am using the chaining DMA example, and thus need to fill the descriptor table. I fill up the descriptors (endpoint address, root complex address, length of the data), then write the number of descriptors into the write header to launch a write, or the read header to launch a read. From what I understand, both the DMA write and the DMA read modules share the same descriptor table, so how can I trigger them at the same time? how do they know which descriptor is for which module? or is a local copy of the descriptor table generated after launching one of them, so that I can overwrite previous descriptors even before the operation is over?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK I think I finally got it.
As the documentation clearly says, there are two descriptor tables, not one. I got confused because the driver I was basing my work on was creating only one descriptor table in RC memory, and using it for both DMA reads and DMA writes, I didn't realize I could just instantiate a second one.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page