Issue using SGDMA to frame buffer

Altera_Forum · ‎01-09-2010

Hi,

I am buffering frames using an SGDMA. I was wondering if anyone has come across the following problem:

After transferring successfully to the ddr2 with several descriptors, the ready signal from the SGDMA drops for good, thereby stalling the frame transfer.

But when using a much smaller frame, say, 100 x 64 pixels instead of 800 x 480, the stalling problem never happens.

Things I have ruled out:

-All timing has been met.

-The descriptors were setup successfully.

-The DDR2 memory passes a read and write test for the entire address range.

Thanks!

Altera_Forum · ‎01-12-2010

I should follow up on my post:

The main reason for the errors was that I was not addressing the descriptor memory properly.

Thanks.

Altera_Forum · ‎01-13-2010

I could see the SGDMA having problems with small frames where the transfers outrun the descriptor. If you want a much easy way to do this check this out: http://www.nioswiki.com/exampledesigns/modular_sgdma_video_frame_buffer

I had frame buffering in mind when I designed that SGDMA so I made it dirt simple to use for video application. The example above is based on this design example: http://www.altera.com/support/examples/nios2/exm-modular-scatter-gather-dma.html

Altera_Forum · ‎01-18-2010

Hi, thanks for the extra pointers.

I'm actually using two SGDMAs for writing and reading to/from a frame buffer. It works great now that I have a good handle on it. It's nice to be able to setup the descriptors while the last transfer is still taking place. It's also great that the descriptor memory doesn't have to be very large.

Altera_Forum · ‎01-18-2010

Do you mean the SGDMA on the Quartus II DVD or the Modular SGDMA I pointed you to? If you are talking about the one on the DVD remember to update the end of the descriptor chain appropriately otherwise you may run into a race condition if the SGDMA catches up with the descriptor you are modifying. By appropriately I mean make sure you update the owned_by_hardware bit in the correct order (always make sure your list is terminated with a descriptor with the owned_by_hardware bit low).

Altera_Forum · ‎09-25-2015

Dear BadOmen,

I need to develop a frame writer block that basically takes streaming video (AV-ST) coming in from a Clock Video Input block and writes it to DDR3 memory. I found two IP core templates:

- mSGDMA

- Avalon-MM Burst Master template

My question is which one would a good starting point for me to use? Note, all logic must be self-contained meaning no support from host (such as NIOS) shall be required during the frame writing process. Hence, I'm wondering whether mSGDMA is an overkill for this application.

Appreciate your feedback!

Altera_Forum · ‎09-25-2015

I would go with the mSGDMA in ST to MM mode. I assume each frame that comes in gets buffered into different memory locations so what you can do is use one descriptor per frame. Make sure to use it in Qsys since the DMA uses information from Qsys to figure out how wide the master address bus should be.

Since you won't have a host what I recommend doing is whatever is going to provide the descriptors into the DMA size the descriptor connection to be 128-bit wide for standard descriptors and 256-bit wide for enhanced descriptors. This will let you commit the descriptor into the DMA using either a single MM write or a single ST beat. The documentation will show you the format of the descriptor.

Also as a heads up I've been told of a bug from a few people about the descriptor FIFOs in the DMA sometimes not working. I'm certain it's a synthesis bug so make sure you set the FIFO depth to at least 512 to avoid Quartus attempting to synthesize those FIFOs using MLABs. Alternatively you can hack the RTL to force the FIFO to be synthesized out of M20ks or set a Quartus assignment that forces the FIFO into M20ks. The bug causes the descriptor data getting turned into zeros which prevents the DMA from working correctly.

Altera_Forum · ‎09-25-2015

Thanks BadOmen for the speedy response. Some follow-up questions:

1. Where can I find the latest User Guide for the mSGDMA? I noticed a section dedicated in the Embedded Peripherals IP User Guide. But I also saw AlteraWiki page with a mSGDMA User Guide.

2. In mSGDMA Qsys GUI, is there any correlation between "Maximum Transfer Length" and the "Data Path FIFO Depth" settings?

3. In terms of my implementation, here is what I'm envisioning. My DDR3 has 128bit AV-MM interface in Qsys. So I'll set the mSGDMA with 128bit interface as well. I'll strip all the packet-related data from the AV-ST stream of the Clocked Video Input block. I'll pack multiple raw pixel values (24bit RGB ) into 128 bits, which will then be sent to the mSGDMA on the AV-ST port to be written to the DDR3 at a location defined by the descriptor. Am I on the right track here?

4. For the AV-ST interface on the mSGDMA, what does the 'READY' signal behaviour depend on? Is it dependent on whether the DATA FIFO is full or does it indicate when the data has been successfully written to the DDR3 memory? Also, for my application, should I enable Packet Support in the mSGDMA GUI?

Thanks in advance!

Altera_Forum · ‎09-25-2015

1) I would start with the embedded peripherals IP user guide because it documents the mSGDMA "macro" component. The documentation up on the wiki is for each core inside that macro component but you might find some details missing from the official document so you may want to cross reference between them.

2) There isn't any dependency between those two values. The maximum transfer length limits how big of a transfer can be performed with a single descriptor (useful when Fmax becomes a problem then you scale it back) and the data path FIFO depth is typically set to be larger than 4x the maximum burst size or deep enough to hide the read latency of the memory (you are writing to memory so that shouldn't matter).

3) Although packing the data improves the efficiency think about whatever is going to use this data after it's buffered in SDRAM. If it's going to be cumbersome for whatever consumes the data after to deal with packed data and you have enough memory bandwidth then I would just stuff every forth byte with zeros to keep each pixel aligned to 4 byte locations.

4) Ready in the write master is mainly hooked up to the internal fifo 'not full' signal. So when ready is high and you assert valid then the data is buffered into the write master and will eventually get written to memory. Packet support is only needed if you want to bound the transfer using start of packet and end of packet, since you already know what the video resolution is then you probably don't need that since you already know how much data is arriving per frame.

Altera_Forum · ‎09-30-2015

Hi BadOmen,

- Can I use "Full Word Accesses Only" based on my use case described in the previous post? When using "Full Word Accesses Only", is the only constraint that our read/write addresses placed inside the descriptor must be a multiple of 4? My hope is to minimize resource utilization as much as possible.

- Would you recommend the use of the "Response Port" or leave it disabled?

- When I'm using AV-ST to AV-MM mode, let's say I fill the descriptor FIFO with 4 write addresses. Is mSGDMA expecting the corresponding 4 sets of data available right away on the AV-ST port or does it wait until it sees data on the AV-ST for each transfer? I guess I'm wondering what happens if due to upstream error, no data shows up on the AV-ST port.

- In the mSGDMA doc, a feature called "Park Read" was mentioned for frame buffering. Can you explain the concept here?

Thank you!

Altera_Forum · ‎09-30-2015

Yes, full word access only seems appropriate since you are moving pixels or multiple pixels around per clock cycle. If the master is 32-bit then yes the data needs 4-byte alignment but if you use a different width then the alignment changes (data width / 8 is the alignment constraint)

You can probably leave the response port disabled, it's really meant for ST-->MM transfers where the transfer is terminated by EOP entering the sink of the write master. Since you are working with video frames I don't see the need to use SOP&EOP packet support since you already know the frame size ahead of time.

If you place 4 descriptors into the DMA it will issue the writes when the data arrives at the ST sink port. When the first descriptor completes (first frame) the write master should be ready to start accepting the next frame of data since you already told it what to do with it in the 2nd descriptor. So if it hasn't buffered the streaming data then it won't write to memory and will sit idle waiting for it. If for some reason the memory backpressures due to traffic congestion then the write master will continue accepting data from the ST sink port since there is a FIFO inside it. If the memory backpressures to the point where the write master FIFO fills then it will backpressure the ST sink port which is typically bad in video applications because you will loose pixels typically unless you have another buffer between the DMA and the video IP. So make sure to size the data FIFO accordingly and setup the memory arbitration share high for the DMA so that it gets more shares of the bandwidth than other masters trying to access the same memory.

When a descriptor has the parked read or write bits enabled what the master will do is keep re-using the same descriptor if there are no other descriptors left to operate on. Write parking doesn't really any practical use that I can think of but read parking is useful when you use the mSGDMA for sending frame buffers out to the display. For example if you have only drawn one frame and are drawing the next one, you can send the first frame to the DMA with the parked reads bit enable which will cause it to keep redisplaying the same frame. Then when you are done drawing the next frame you send that descriptor and the DMA will then transition to the next descriptor. This prevents screen tearing because you don't want to be drawing to the same frame as the one that's being displayed but at the same time whatever is drawing the frame might not be able to keep up with the display rate so this parked read feature lets you implement double buffering and not have to worry about constantly re-sending the same descriptor multiple times to redisplay the same frame over again.

Altera_Forum · ‎10-01-2015

Hi BadOmen,

If I have multiple AV-ST video streams that I want to write to memory (each has its own frame memory space), do I instante multiple copies of the mSGDMA (ie. one per AV-ST video stream) or is there a way to cut down on the resources? Any suggestions would be appreciated!

Thanks.

Altera_Forum · ‎10-01-2015

Since it's a single channel DMA the easiest way would be to have multiple DMAs assuming all the video feeds are concurrent. If you only need to write one video frame at a time to memory then a single DMA should be sufficient along with a streaming mux that selects the appropriate input.

If the video streams are concurrent and you wanted to use a single DMA then you would need to break each frame down into smaller transfers and constantly feed descriptors into the DMA cycling through each video input a portion of a frame at a time. Of course you'll need a large enough buffer for each video input to hold the incoming data while other video inputs are being written out to memory.

Altera_Forum · ‎10-27-2015

Hi BadOmen,

I understand that the mSGDMA was designed to work with one clock domain. For the case where I want to write AV-ST data to AV-MM location, assume the AV-ST is running at CLK1 speed while AV-MM (memory interface) running at a different CLK2 speed. If I want to have a different clocks for the two interfaces, what is the best approach?

- Add a AV-ST dual-clock FIFO before the mSGDMA block to convert the AV-ST to the AV-MM clock domain and then run the mSGDMA with AV-MM clock domain; OR

- Change the SCFIFO instantiated inside the "write master" block of mSGDMA to a DCFIFO; OR

- Another approach???

Appreciate your feedback!

Altera_Forum · ‎10-27-2015

The first approach would be best even though it increases the latency through the DMA and uses more memory resources. I looked into using DCFIFO a while back but that will change the latency of the FIFO used, full, and empty signals which drives some of the control logic that the DMA uses to backpressure on the MM side. So it's doable to make that change but it would take some work to redo the skid logic inside and there is also some information like sop/eop/empty from the ST port that would need to cross clock domains as well which would need to be taken care of as well. If you select a shallow FIFO and target a device with MLABs then you'll probably get a clock crossing FIFO that doesn't use memory blocks.

Altera_Forum · ‎11-17-2016

If you want to understand the mSGDMA controller & a Video Frame Buffer for HDMI then the files I put on the altera wiki a few weeks ago may help you.

http://www.alterawiki.com/wiki/cyclone5_starterkit_hdmi