Re: Cyclone V SX SoC - DMA Controller Peripheral Request Interface

Altera_Forum · ‎10-09-2014

Hi guys,

I am trying to find out how the DMA peripheral request interface could be used.

In the HPS component Interface description(cv_54028) and in the DMA Controller description(http://www.altera.com/literature/hb/cyclone-v/cv_54016.pdf) is no information how the fpga peripheral request interface must handled.

my specific question is how is the burst length determined at fpga logic peripheral requests??

In the DMA Controller description is Peripheral Length Management and DMA controlled length management possible.

I need Peripheral Length Management, but there are not the same signals like in the DMA Controller Interface description...:confused:

I hope someone have experience with this. Thanks a lot for your Support !

(From HPS component Interface description)

--- Quote Start ---

peripheral signal interfaces

The DMA controller interface allows soft IP in the FPGA fabric to communicate with the DMA controller

in the HPS. You can configure up to eight separate interface channels.

• f2h_dma_req0—FPGA DMA controller peripheral request interface 0

Each of the DMA peripheral request interface contains the following three signals:

• f2h_dma_req—This signal is used to request burst transfer using the DMA

• f2h_dma_single—This signal is used to request single word transfer using the DMA

• f2h_dma_ack—This signal indicates the DMA acknowledgment upon requests from the FPGA

For more information, refer to the DMA Controller chapter in the Cyclone V Device Handbook, Volume 3.

--- Quote End ---

Altera_Forum · ‎10-13-2014

The burst length is basically an arbitrary number that both the DMA and the peripheral must agree upon. So it doesn't have anything to do with burst transactions at a bus level but rather an agreed upon number of transfers.

So when the peripheral requests a burst the DMA has to know what the burst size is ahead of time. Typically you make the burst size programmable in the peripheral and you pass this size to the peripheral and the DMA when setting up the DMA transfers.

The reason why this is important is lets say the peripheral is a FIFO and the DMA writes into it. If the DMA and peipheral do not agree to the same burst length then the DMA could potentially read/write too much data causing an under/overflow to occur.

I have attached a file that contains more details in the comments about how to drive these request signals and how to react to the acknowledge back from the DMA.

Altera_Forum · ‎10-13-2014

Thanks for you Reply,

so this flow control fifo example and the fpga_dma example driver from altera just use DMA Controlled Length Management? Hard to understand the driver :/.

I thought I can give the length information with the DMA Request Interface. Like the fpga logic push a request with length information inside.

So either the DMA Controller has to read every time I want to use a peripheral request(with different packet size) the register of my fpga logic (DMA Controlled Length Managemengt )OR

the length information can be attached to the request (Peripheral Length Management).

In my case I receive Packets with different size via Avalon Streaming and want to give the information with the peripheral request, and not that the DMA Controller has to read every time the length register of my fpga logic. I would understand the documentation of altera that both scenarios are possible, but the FPGA peripheral requist interface is different compared to the interface inside the HPS logic.

I hope my english is understandable ;) . Thanks a lot badOmen

I would understand your post like I have to agree the length before the peripheral request occour?

Altera_Forum · ‎10-14-2014

Actually I should have mentioned that the RTL I attached is infant what that FPGA DMA example driver is for. Peripherals can make two types of transfer requests bursts and singles. Burst requests are for transfer multiple words of data per request and single registers are for transferring only one word of data per request (by word I mean whatever data width the DMA is setup to transfer).

In the example FPGA DMA driver what should be happening is the DMA channel should be programmed with a burst length and the same burst size is programmed into the FIFO logic in the FPGA. This burst size can be changed but the important thing is that both the peripheral and DMA channel need to be using the same burst length. For example if the FIFO depth is 128 and the burst size for the peripheral is 4, that means when the DMA writes to the FIFO it's always being told there are at least 4 words worth of space in the FIFO. Now if the DMA was programmed for bursts of 8 words but the FIFO is programmed for bursts of 4, if the FIFO becomes already has 124 words of data buffered it will request a burst of 4 and the DMA will write 8 words and probably overflow the FIFO.

Normally I recommend that if data is being moved between the HPS and FPGA that users put a soft DMA engine into the FPGA fabric. This makes variable length ST to MM transfers much easier because it's tricky to setup a DMA-330 channel when you don't know how much incoming data is arriving, that might not make sense until you read the DMA chapter and see what the channel microcode looks like. Also if a lot of data bandwidth is needed, using soft DMA engines in the FPGA also have the additional benefit of more memory HPS SDRAM bandwidth. The DMA accesses SDRAM through a 32-bit connection so at 400Mhz you are talking at most 12.8Gbps of bandwidth. The FPGA has a 256-bit aggregate connection to the SDRAM so running at only 100Mhz gives you 25.6Gbps of bandwidth from the FPGA into HPS SDRAM. If using a soft DMA sounds feasible to you I would check recommend using the modular SGDMA which is available in Qsys version 14.0. Unfortunately it doesn't have a driver so you can get a Nios II baremetal driver and documentation from here: http://www.alterawiki.com/wiki/modular_sgdma porting it over to run on the ARM is fairly trivial I'm told.

Altera_Forum · ‎01-10-2015

Dear BadOmen

I am using altera’s cyclone V with the linux kernel 3.8

I am working on a project in which I to make a peripheral in which I write to a FIFO, the FIFO is read and some logic is performed to the data, and the results are stored in another FIFO. The goal is to transfer data between the HPS and the peripheral as fast as possible. To achieve this I want to use DMA transfers between the HPS and the peripheral using the hard DMA-330 IP in the HPS.

I am new to linux device drivers and I need help writing the device driver for this peripheral.

I saw your flow_control_fifo.v peripheral code and the fpga-dma.c driver code. Could you please guide me to have this working with my requirements? I understand the verilog code but I need help with driver.

Sincerely

Ankit

Altera_Forum · ‎01-12-2015

Dear Ankit,

when you need help with die device driver, write the mailing list at rocketboards.org and look into the archive. I am also at a similar projekt, when you want to use "Master" Transfers (DMA-330 initializes transfer) there is no example driver. For "Slave" Transfers (Peripheral has to request transfers) you will find an example in the "linux-socfpga" repository. "fpga-dma.c" under "linux-socfpga\arch\arm\mach-socfpga" . just some hints. I am also not finished with my driver.

Kr,

Florian

Altera_Forum · ‎01-12-2015

I'm more on the hardware side of the fence so I recommend the mailing list at rocketboards as well.

One thing to know is that if you want to maximize the throughput of moving data from SDRAM in the HPS to the FPGA the fastest method is to have the FPGA pull the data out of the SDRAM directly using the FPGA-to-SDRAM ports. You can have a 256-bit wide direct interface into the SDRAM from the FPGA which will dwarf the throughput of having the HPS DMA pushing the data out of the HPS-to-FPGA bridge. A DMA capable of memory-mapped to streaming transfers is all you need to make this possible and it should be a lot easier to program for as well.

Altera_Forum · ‎01-12-2015

Dear BadOmen

Referring to your flow_control_fifo.v file. I am having trouble instantiating and integrating that component in Qsys. As per my understanding the data port and the tx+rx ports are connected to the DMAC; the csr port is connected to the hps. When I create a new component in Qsys, the data and csr ports from the verilog file are automatically detected as interfaces with correct signal types. However I am not sure what to do with tx and rx ports of the verilog file? Which interface would they be a part of and what would the signal type be? Should I make them conduit_end and export the signals? In that case how would I connect them to the DMAC at the SoCKit top level file.

In short, would you have an example project with this fifo implemented as a peripheral?

With regard to you reply above, again, can you please share some design project so that I can learn and modify it as per my needs.

Sincerely

Ankit

Altera_Forum · ‎01-13-2015

Dear BadOmen

I just came across this sample project on Rocketboards:

SampleDmaQuartusProjectForFpgaDmaCInTheKernel313

This project uses the same flow control fifo. Is this what I was looking for? What is happening here?

How can I recreate the steps of this project on my own? There doesn't seem to be any README file here.

Sincerely

Ankit

Altera_Forum · ‎01-13-2015

I didn't realize the hardware was posted but I took a look and it contains the necessary verilog and .tcl file under /ip/flow_control_fifo. If you try to use that FIFO in your own design just move the IP directory to your own hardware project and the flow control FIFO will show up the next time you open Qsys. That design is just an old version of the golden hardware reference design with the FIFO added to the system and some wiring of the DMA flow control signals at the top level.

Altera_Forum · ‎01-13-2015

Dear BadOmen

A question about the design: In the component editor of the Loopback FIFO (our flow control fifo), for tx_single/burst/ack and rx_single/burst/ack signals I see that although the signal types are export, the interfaces are tx_pri and rx_pri. How do I get this? What are these *pri interfaces?

In my design there is a catch. I have to create a peripheral in which there are 2 fifos and some processing logic in between the 2. As per my understanding I will have to use fifos that have Avalon Slave MM write interface and Avalon Slave ST source interface for the input fifo and Avalon Slave ST sink interface and Avalon Slave MM read interface for the output fifo. I know that I can instantiate these fifos directly from the IP catalog, and when I generate the design, the verilog files will be automatically created. My question is that how do I add the wrapping flow control logic that you have used in the flow control fifo design example?

Thank you for being so patient with me and helping me with this design project. I go to school and I lack the wisdom that you and all the other people have on this forum.

Sincerely

Ankit

Altera_Forum · ‎01-13-2015

Those are the DMA "peripheral request interface" signals. The RX group is for flow control of receive channel of the FIFO (writing data to the FIFO) and the TX group is for the flow control of the transmit channel of the FIFO (reading data from the FIFO).

tx_single represents the FIFO not full status

rx_single represents the FIFO not empty status

tx_burst represents the FIFO having a fill level equal to or exceeding the burst size

rx_burst represents the FIFO having a fill level low enough that it can handle another burst of data written into it

rx_ack is a signal that pulses every time the DMA either issues a flush or acknowledges a peripheral transfer

tx_ack .... same thing for the TX channel

When I say "burst" in this context I'm talking about a predefined block of data and not memory-mapped bursts. The burst size programmed into the peripheral needs to match the burst size programmed into the DMA channel thread. These PRI interfaces let the peripherals communicate to the DMA letting it know when it's safe to transfer data. I recommend reading the DMA chapter of the technical reference manual as well as the comments in the custom FIFO IP to learn more about peripheral transfers and how the handshake works. This group of single, burst, and acknowledge signals is defined by Synopsys and used within the HPS block for all the Synopsys IP that communicates with the DMA-330 core.

Altera_Forum · ‎01-18-2015

Dear BadOmen

As per my understanding of the verilog code, I think that your explanation of the RX and TX interfaces is mixed up. It should be the other way round. Am I correct?

--- Quote Start ---

Those are the DMA "peripheral request interface" signals. The RX group is for flow control of receive channel of the FIFO (writing data to the FIFO) and the TX group is for the flow control of the transmit channel of the FIFO (reading data from the FIFO).

--- Quote End ---

I think the RX and TX here are in terms of the DMAC. It uses the TX channel to write data to the FIFO and RX channel to read the data from the FIFO.

--- Quote Start ---

tx_single represents the FIFO not full status

rx_single represents the FIFO not empty status

tx_burst represents the FIFO having a fill level equal to or exceeding the burst size

rx_burst represents the FIFO having a fill level low enough that it can handle another burst of data written into it

--- Quote End ---

Again here the single and burst signals contradict each other. Only looking at the single signals, FIFO not being full implies that tx channel is used for writing into the FIFO and not empty implies that rx channel is used for reading from the FIFO.

Sincerely

Ankit

Altera_Forum · ‎01-18-2015

Dear fberndl

I am new to device drivers and I am having a tough time understanding the fpga-dma.c device driver.

Can you please guide me on how to write the application program using this device driver? If anyone can share some sample code already written using this driver, it would be of great help.

I have slightly modified the hardware this driver is intended for. Instead of the device being a loopback FIFO (data written to a FIFO and then read back from the same FIFO), I have created to separate FIFOs. Data is written into one, Passed to a second FIFO internally in hardware and then read back from that second FIFO.

Can you please guide me on the changes required in this driver for that? Also I think that the application code should have 2 separate threads, one for writing and one for reading, else we wouldn’t be using the device to its full potential (i.e. the max data transfer speed achievable). Am I correct?

Sincerely

Ankit

PS: As per your advice I have also replied to rocketboards mail thread concerning this driver. Haven't heard back though. I was hoping that you could help me.

Altera_Forum · ‎01-19-2015

Oops you are correct, I had RX and TX backwards. The RX and TX are from the perspective of the DMA and not the FIFO, so TX is when you write data into the FIFO and RX is when you read data from it.

Altera_Forum · ‎01-20-2015

Dear ankpradh,

how does the user interact with your driver , so what is the high level application.

I think in the dma driver there is the debugfs interface used,or in some other example driver . With this debugfs interface you can start an write or read to/from the FIFO via command line.

So what happens if you trigger something from debugfs?

0.)debugfs function is called

1.)Register of fifo will be read (length information)

2.)The DMAC will be configured for this DMA Peripheral Request

3.) An "start" bit will be set in the "CSR" or named similar

4.) The fifo control logic sets the DMA Peripheral Request Interface signals (tx_single,tx_burst, rx_single,rx_burst)

5.) The dmac does the configured stuff ..

So if you need two threads depends on which application you have? Again: What is the target of this project?

Altera_Forum · ‎02-08-2018

In case anyone is getting stuck on CYCLONE V DMA peripheral request DMAWFP. (f2h_dma_req)

Here is the clue.

First, search for per2modrst register in the Reset Manager on HPS altera homepage.

Second, you need to make sure the DMA channels are taken out of reset, all 8 channels are being reset and remain reset after POR. For example, I need channel 0,1,2, you have to do this in the u-boot cmd:

>mm.b 0xFFD05018

ffd05018: ff ? f8

ffd05019: 00 ? q

>

Finally your DMA330 should be able to respond to the requests from FPGA.

good luck.