I assume you are talking about the DMA that shows up in Qsys. The DMA is capable of moving data from memory to memory and only buffers a single descriptor. What that means is that after the DMA moves the data then to get the DMA to move more data you have to send it a new descriptor. So there is a lot of stop and go that happens with non-SG DMAs.The mSGDMA which stands for modular scatter gather DMA is capable of buffering multiple descriptors and can handle ST-->MM, MM-->ST, and MM-->MM transfers. So the biggest difference is that a host doesn't have to wait for the mSGDMA to finish a transfer before telling it what to do next, it can buffer multiple descriptors and move onto the next transfer when the current transfer completes. In terms of hardware resources the mSGDMA is larger but it's also around 10 years newer so it supports many more features than the old DMA. Maybe if you explained what you want a DMA for I can recommend which one to use.
Can your hardware handle the data come in as a stream? If so you might not need to temporarily copy it to SDRAM and just have the DMA pull it out and stream it to your hardware using Avalon-ST. Then you can have another DMA pull the data using Avalon-ST and write it back to memory.
The old DMA is quite limited especially when it comes to bursting so if throughput is important you may want to go with the mSGDMA for that alone. If you leave out the descriptor fetching engine then they both have similar frontends where a host just writes to a few registers and kicks a go bit to get data movements started. The burst issue with the old DMA that I don't think ever got fixed was once you enabled bursting you were limited to transferring only a single burst of data. I bring this up because you posted this thread in the SoC section so I'm assuming you are DMA'ing data from the HPS memory space which as an AXI NoC which is turned for bursts and not single beat transactions.Where does the big image originate? In other words is it already in main memory or is it stored in a flash file system? Also are you really talking about HPS systems or is this perhaps a Nios II system?
I am really talking about HPS system. I have the Linaro Linux with my C application. Application is a server that receives images from client over Ethernet. Images are not stored in a flash file system, they are in main memory, ready to DMA'ing to FPGA. (After performing the image processing I will want to DMA data back to the HPS )I attach a file from Qsys. I will be grateful if you take a look at it and tell me if DMA will work. I am doing a HPS FPGA project the first time and I'm a little lost in this topic.
--- Quote Start --- I am really talking about HPS system. I have the Linaro Linux with my C application. Application is a server that receives images from client over Ethernet. Images are not stored in a flash file system, they are in main memory, ready to DMA'ing to FPGA. (After performing the image processing I will want to DMA data back to the HPS ) I attach a file from Qsys. I will be grateful if you take a look at it and tell me if DMA will work. I am doing a HPS FPGA project the first time and I'm a little lost in this topic. --- Quote End --- I don't see a problem with this at all-- the HPS receives an image, dumps it in to a dedicated memory space, and then you can trigger a DMA to push the data to hardware for processing, using backpressure if you want, even. Then a second DMA can pull the results back in to RAM. I'm using the mSGDMA now and having no issues with it, as long as you're not using park mode or Quartus earlier than 15.1.1. Do you need the entire image in the FPGA, or just single lines? There would be ways to optimize this, but you can get tons of throughput if the design is planned well. I haven't opened the file yet because the forums aren't letting me-- but I'll take a look later.
For some reason I can't get at the file too. But I agree with derim, if you break the image processing down to small portions of the frame you might be able to get a lot of speedup.The reason why I was asking earlier if you can process the data as a stream is if you instantiate the blocks inside the mSGDMA (dispatcher, read master, and write master) you can perform a memory to memory transfer with your hardware accelerator between the read and write master if the number of bytes into the block = number of bytes coming out. If there isn't a balance between the amount of input data and output you would need use two DMAs so that can control them differently. This is what I mean by transform + transfer type of operations: HPS SDRAM --> DMA read master --> Your video transform logic --> DMA write master --> HPS SDRAM With a topology like that the operation just looks like a memory movement to the system. It's also self scheduling because the DMA won't write the result to memory until they have been processed which makes scheduling much simpler.
It is strange that you can not open the file, I have no problem with that. I attach screenshots from Qsys. How should I set the parameters of mSGDMA? SDRAM has 64MB.I understand your solution, it is fine but some of my calculations are quite specific and can not be done on the stream - I need data from different pieces of the image. So what i want to do is to transfer data from HPS memory to the FPGA RAM, run FPGA modules, send data back to HPS memory. I need for that two mSGDMA modules. It is clear for me. I propose that we focused on the first stage - sending data to FPGA. I need technical support. If configuration in Qsys is ok I need to write code in my C app. And here is the biggest problem. How to run DMA from C code?
This example is probably worth looking at, it's a bare metal program but it give you an idea what has to happen at a low level to communicate with the DMA core: https://www.altera.com/support/support-resources/design-examples/soc/fpga-to-hps-bridges-design-exam...Those DMAs are performing ST-to-MM and MM-to-ST transfers so that's why the descriptor doesn't have a source and destination location.
Hi Tomek,I'm doing very similar thing as You do. My recommendation is definitely the mSGDMA. The source DMA is configured as "Memory-mapped to Streaming", the Sink DMA is configured as "Streaming to Memory Mapped". In between You implement your processing functions. I use standard ST-Packets to mark Image start and stop. I use also reserved memory which I define in device tree file and use for DMA transfers. This is because DMA needs physical address which is hard to get in Linux without implementing kernel modules. If you have reserved memory You can mmap this and access it from HPS ARM processor via virtual adress and give the physical adress in DMA descriptors. Enable "burst mode" to have higher performance. I'm not using IRQ based response channel in DMA, I'm counting symbols which are coming into sink DMA and read them out using Memory mapped PIO component. I also have compiled/ported some of the NIOS code of the software layer for creating the DMA descriptors for MSGDMA and schedule sync transfers.
Important question... I based my design on the DE1_SOC_Linux_FB demonstration. Should I generate and compile new preloader when i added two msgdmas and SDRAM controller?
As a general rule of thumb any time the Qsys system changes I would get into a habit of regenerating the preloader. I don't think it'll make a difference here since the preloader doesn't know anything about FPGA DMAs but you don't want to face a hardware + software mismatch getting into a habit of using the same preloader across multiple generated Qsys systems.