[EDIT] SOLVEDHello, I'm trying to develop an accelerator for a computer vision algorithm. The host program is running on the Intel Atom on the DE2i-150 board, and is sending images to the FPGA for processing. The problem is that I can't get more than 3 to 10MB/s throughput, which is quite small since the PCI-Express should be able to deliver something about 250MB/s, and the SDRAM should be even faster. My QSYS is pretty simple, you could see it on the attachment. I would appreciate if someone could help me with this. Is that a common issue? Can I be able to get more speed, something like 90MB/s or so? If so, how could I accomplish that? Many thanks.
Hello,I don't know why the forum keeps resizing my image. And if I post a link, the moderator must accept it, and I think it will take a very long time. If you don't mind, I've uploaded the image on the cloud, hope there is no problem posting it here like this: ht tp://1drv .ms/1OVVaw3 Just copy and paste it and delete the spaces.
At a quick glance, it looks like the dma and sdram are on different clocks. Maybe it would be best to place a dual-clock FIFO between the SGDMA and the SDRAM.
Hello,Thanks for your answer. Yes, in fact they have different clocks. I tried looking into the solution you have proposed, but the Dual clock FIFO uses Avalon-ST, so the input is a sink ant the output is a streaming source. I'm still a newbie on this thing, so I'm not sure how I should connect this between my SGDMA and the SDRAM. Can you give me some tips? I've now uploaded a better screenshot, and my qsys diagram in case you or anybody else would like to take a closer look. I'm not intending to delete this in the future, so if we solve this, I will upload the corrected version for other people to download. (just paste and delete the spaces again if you're willing to take a look at this) ht tp://1dr v.ms/1E91SZd
Well what you can do is change your sgdma to memory - stream, then use DC fifo, then use another DMA to do stream - memory. What would make this easier is if you use the Modular SGDMA which breaks out the descriptor control, write control, and read control. I think you will be able to get away with 1 descriptor control vs 2 if you use Altera's SGDMA.
Modular SGDMA should be included in qsys now. Otherwise here is the link: http://www.alterawiki.com/wiki/modular_sgdmaIn the diagram in the link, Read Master and Write Master are connected thru Avalon - ST. Place the DCFIFO in that path.
Put the dma descriptors into on-fpga memory.Make sure the dma controller is doing Avalon burst transfers (probably 128 bytes, preferable 64 bits wide) into the pcie txs port. You probably don't want burst transfers into the SDRAM - just pipelined. Beware of large fifos in the dma controller - you don't need them. The pcie txs block seems to complete write transfers quickly. I'm seeing reads take 128 clocks (of the 62.5MHz app clock) + a few clocks for the transfers size. The same is true of host initiated transfers, writes are 'posted' and happen more or less back to back but there is a 128 clock delay between reads. The only way to speed up DMA reads from host memory (once you are generating Avalon bursts and thus long PCIe TLP) is to generate concurrent read requests from multiple avalon masters.
krasner,I've tried doing what you said, using mSGDMA. I think I've done something veeery wrong, because my board actually stopped working until I've reset the BIOS (I even thought I had lost it :( ). Before putting in the FIFOs, I've done the basic scheme, using the dispatcher and the read / write masters. Here is the qsys scheme I've done for this: ht tp://1dr v.ms/1IpfnYH Notice that I've removed the old PLL, and now I'm using the pcie_core_clk, which provides 125MHz, on the DMA and SDRAM. I did that because the TimeQuest said it was not possible to get the 150MHz I was willing for. I've also uploaded the new qsys on the same folder as before in case someone wants to take a look: ht tp://1dr v.ms/1E91SZd dsl, Where should I put the dma descriptors? I'm very new, so sorry if that's a stupid question. Notice that I'm not interested in using the onchip memory, but only to speed up the SDRAM. Also, I've tried using bursts but I don't understand how to use it. Everytime I activate burst transfers on the DMA, my host application stops being able to communicate with the FPGA. Just to clarify, I'm using the Jungo Windows driver provided with the board. It works fine with the DMA scheme I've been using, but I'm getting these slow transfers I've been talking about. Is there any chance that the problem is actually in the driver? Hope you guys still have some patience to help me. Many thanks.
I think this example is very close to what you want: http://www.alterawiki.com/wiki/pci_express_in_qsys_example_designs. I'm wondering if you setup the dma_read_master and dma_write_master options correctly.See if this example works for you.
I've came across this example when searching for something about mSGDMA. I've basically downloaded the qsys provided on this page and adapted it a bit, before trying to put the DC FIFO There were indeed some configurations I didn't know how to setup. For example:Descriptor FIFO Depth on the mSGDMA Length Width on read master and write master Transfer Types under Memory Access Options on read master and write master
Yes that was my first try. After that I tried to rewrite everything disabling some stuff like burst and it kept happening.I'm starting to give up, but it doesn't make any sense since the PCIe should be fast enough even if bad configured, I think. And the same applies to the SDRAM. That's why I'm still trying.
Don't give up. dsl was talking about descriptors. Do you understand what they do? Maybe that is your bottleneck. If you aren't setting them properly then the dma won't process continuously.
Actually I don't understand much about this stuff, I'm starting out because of the Intel Embedded Systems Competitions (http://sbesc.lisha.ufsc.br/sbesc2015/intel+embedded+systems+competition). I've tried doing a LOT of research but the documentation seems to be sparse on what concerns these IP modules (PCIe, DMA, and many others). The best documentation that I've found was the "Video and Image Processing Suite User Guide", which is pretty good and deals with the whole Avalon-ST stuff but takes a lot to get to the point.So, any help would be welcome when dealing with this. Long story short, I don't know how descriptors work. :(
One thing is that this PCIe communication is a huge problem here, everybody that I know that is competing have this problem. Solving this would be a big factor for us, and it seems that I'm the one who is digging deeper into this.
A descriptor is an instruction set that the master (in your case pc) gives the DMA that describes where to access data and how much data to take. In a memory-memory application like you have, the descriptor should contain the address locations of the source data (pcie) and the sink for the data (sdram) (or vice-versa) and the number of bytes to transfer in 1 pass or burst. With the mSGDMA you can write many descriptors into the descriptor buffer (see the dispatcher) and you can command the dispatcher to chain them: so once one descriptor is complete, the dma will automatically begin working on the next one. This way you can continuously transfer data.
BTW, what board are you using? I assume its the cyclone iv gx fgpa development board. This board has a ddr2 sdram. If so, use this example: https://www.altera.com/support/support-resources/software/download/refdesigns/ip/interface/dnl-pciex...Go to the link named: Qsys PCI Express to External Memory reference design for Cyclone IV GX This is probably exactly what you want.