Data stream to PCIe: regular or scatter gather DMA controller?

Altera_Forum · ‎06-02-2012

For a simple data flow (FIFO output to DMA controller to PCIe TXS port) is there any advantage to using the scatter/gather DMA controller rather than the regular DMA controller?

There is only one source and no addressing (a fifo) for inputs to the read master port on the DMA, all transactions are the same length.

Altera_Forum · ‎06-04-2012

The main advantages of the scatter/gather DMA are that you can chain several DMA commands with no software intervention, and that the DMA core can be connected to Avalon Stream sources and sinks, so it can be interfaced with FIFOs and components more easily (IIRC the regular DMA can only do memory to memory transfers).

IF you don't need those two features then the regular DMA is probably simpler to use.

There is also a modular sgdma (http://www.alterawiki.com/wiki/modular_sgdma) component available on the wiki that I think is easier to use than the SGDMA. But I haven't tried it.

Altera_Forum · ‎06-04-2012

I don't understand what it means when you say that the regular dma can only do memory to memory transfers and one of the others is required for using fifos. Is that just more efficient or am I missing something fundamental in the interface fabric?

For a simple case, which is what I'm trying to get working now, can I not connect a fifo to exported signals from the regular dma read master as follows:

read_n signal from the dma port to the fifo read clock

fifo data out to dma read data in

dma read wait request always low

dma read data valid always high

Altera_Forum · ‎06-05-2012

The regular DMA uses memory mapped interfaces, which are designed to be connected to memory components, or at least components with an area mapped as a memory. Of course you can choose to emulate a memory component by ignoring the address and connect some signals as you are trying to do, but there are several problems:

[list][*]as a general design rule, you shouldn't use a generated signal as a clock in an FPGA. You can have glitches on that signal, that can cause unwanted clock cycles at unpredictable times (and usually not when you want them ;) ) and timing problems

[*]if the DMA wants to read on several cycles in a row it will just keep the read_n signal low. So if you use it as a clock it won't work. Instead, use the same clock the DMA is using as the fifo read clock, and use an inverted version of read_n for the read request.

[*]by keeping the wait request always low you won't prevent the DMA from trying to read your fifo when it is empty. You should probably use this signal to stop the DMA when the FIFO is empty instead

[*]keeping the data valid signal high violates the specification (http://www.altera.com/literature/manual/mnl_avalon_spec.pdf) which says "A slave with readdatavalid must assert this signal for one cycle for each read access it has received. There must be at least one cycle of latency between acceptance of the read and assertion of readdatavalid." It may or may not work with the DMA controller (I don't know its internals) and even if it does there is no guarantee it will work with future versions. It is better to stick to the specification and use for example a delayed version of the read request.

[*]as you don't decode the address you have no way of knowing where the controller is in its transaction. Depending on the type of data you need to transfer it may not be a problem, but in some cases it can be interesting to signal the beginning and the end of a transaction, for example because you are sending packets with a specific format, or you need to keep track of timing. You could read the address to detect when you are at the beginning of a transaction, but you have no way to tell the DMA controller that the transaction is finished.[/list]

The Avalon Stream interface is more adapted to those kind of transfers IMHO, and it very easy to use.

Altera_Forum · ‎06-05-2012

Thanks for the well considered and detailed reply

--- Quote Start ---

The regular DMA uses memory mapped interfaces, which are designed to be connected to memory components, or at least components with an area mapped as a memory. Of course you can choose to emulate a memory component by ignoring the address and connect some signals as you are trying to do, but there are several problems:

[list][*]as a general design rule, you shouldn't use a generated signal as a clock in an FPGA. You can have glitches on that signal, that can cause unwanted clock cycles at unpredictable times (and usually not when you want them ;) ) and timing problems

**That's correct, and I was unclear. I'm connecting the read_n signal from the dma read master port to the 'rdreq' port on the DCFIFO megafunction. My understanding is that this is a synchronous fifo and the that input functions as a read clock enable.

[*]if the DMA wants to read on several cycles in a row it will just keep the read_n signal low. So if you use it as a clock it won't work. Instead, use the same clock the DMA is using as the fifo read clock, and use an inverted version of read_n for the read request.

** Again, I was unclear: This is a fully synchronous system: DMA and read side of fifo have the same clock. And the inverted read_n goes to the 'rdreq' port.

[*]by keeping the wait request always low you won't prevent the DMA from trying to read your fifo when it is empty. You should probably use this signal to stop the DMA when the FIFO is empty instead.

** My state machine does that, but since I was only getting a single 'read_n' from the dma (or at least only one followed by at least 80us of inactivity according to both signaltap and oscilloscope probing) despite a length register programmed to 128, I was trying to eliminate that as a cause. So I hold it inactive as a test.

[*]keeping the data valid signal high violates the specification (http://www.altera.com/literature/manual/mnl_avalon_spec.pdf) which says "A slave with readdatavalid must assert this signal for one cycle for each read access it has received. There must be at least one cycle of latency between acceptance of the read and assertion of readdatavalid." It may or may not work with the DMA controller (I don't know its internals) and even if it does there is no guarantee it will work with future versions. It is better to stick to the specification and use for example a delayed version of the read request.

**Good quote (Avalon spec table 3-1, page 3-4.) I will try the explicit de-assertion. This could be better written, but I agree that it implies de-assertion during the 'latency'. Looking at the dma controller code, this input doesn't seem to lead into any synchronous circuitry.

[*]as you don't decode the address you have no way of knowing where the controller is in its transaction. Depending on the type of data you need to transfer it may not be a problem, but in some cases it can be interesting to signal the beginning and the end of a transaction, for example because you are sending packets with a specific format, or you need to keep track of timing. You could read the address to detect when you are at the beginning of a transaction, but you have no way to tell the DMA controller that the transaction is finished.[/list]

** I'm programming the dma length register to equal the fifo depth and using fifo full to start the transfer and fifo empty to end it (using wait request to hold the next transfer until the fifo is filled again). Simple, I thought, but not simple enough, apparently.

The Avalon Stream interface is more adapted to those kind of transfers IMHO, and it very easy to use.

--- Quote End ---

** Maybe so, but the Qsys flow doesn't support a streaming interface to the pcie. It seems like a lot of work to put the pci express block outside of qsys and take care of flow control (credit processing must be done by hand) just to do dma, especially when my transactions are always the same length.

Altera_Forum · ‎06-06-2012

Well in this case I think it should work. As long as you know the limits and can cope with them in your case I see no reason to move to another DMA.

Altera_Forum · ‎06-12-2012

--- Quote Start ---

** Maybe so, but the Qsys flow doesn't support a streaming interface to the pcie. It seems like a lot of work to put the pci express block outside of qsys and take care of flow control (credit processing must be done by hand) just to do dma, especially when my transactions are always the same length.

--- Quote End ---

I'd be careful which version of Quartus you're using.

I have a design which originated driving DMA on a PCI interface.

I migrated this to drive DMA on the Cyclone IV hard PCIe core with Quartus 11.0. This worked, except interrupts were broken in 11.0.

When 11.1 was released the interrupts worked, but I have been getting holes in the data that's been DMA'd to the host memory. If I generate my project Qsys system then build the FPGA in 11.0 then DMA works, if I do the same, with the same project, in 11.1 then DMA is broken.

I've been discussing this off-line with Badomen but we weren't able to come up with a solution.

I was just about to submit an SR about this when I noticed earlier that Quartus 12.0 has been released. This appears to fix the problem.

So, with 11.0 the interrupts were broken and 11.1sp2 the DMA engine was broken, this doesn't give great confidence in Altera's quality control.

:mad:

So far 12.0 looks OK.

Nial

Altera_Forum · ‎06-14-2012

I've verified by interrogating the control registers with a NIOS and using a protocol analyzer that the 'DMA controller' IP does not work in a quartus/PCIe system. The transaction length register counts down to 0 and then ... keeps on going! packets never stop.

I've since discovered that chapter 1 'Introduction' to the Embedded Periperals IP User Guide says that this core is not 'supported by QSYS'. This is not mentioned in the chapter describing the DMA controller itself. This should be translated to mean: It doesn't work. I hope this post can prevent someone else from wasting time with this.

Altera_Forum · ‎02-27-2013

--- Quote Start ---

** Maybe so, but the Qsys flow doesn't support a streaming interface to the pcie. It seems like a lot of work to put the pci express block outside of qsys and take care of flow control (credit processing must be done by hand) just to do dma, especially when my transactions are always the same length.

--- Quote End ---

Where is it described - how to put the PCI Express block outside of Qsys, take care of flow control and how credit processing is to be done by hand?

--- Quote Start ---

I've since discovered that chapter 1 'Introduction' to the Embedded Periperals IP User Guide says that this core is not 'supported by QSYS'. This is not mentioned in the chapter describing the DMA controller itself. This should be translated to mean: It doesn't work. I hope this post can prevent someone else from wasting time with this.

--- Quote End ---

That is strange because in Chapter 17 ( Qsys Example ) from IP Compiler for PCI Express User Guide ( May 2011 ) DMA controller IP core is used.

Altera_Forum · ‎02-28-2013

Adding comments to every random PCIe thread won't help solve your problem.

Altera_Forum · ‎02-28-2013

Not to random PCIe threads, but to those of my interest and to those, that are mostly unanswered.