Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
8,285 Views

mSGDMA, Qsys, and Linux Integration

So I've been trying to get the Modular SG-DMA working in a Qsys project tied with Linux. I can get it to show up in the device tree, be recognized by Linux in /sys, and it looks like the memory mapping works. 

 

Below is a clip from the msgdma in my device tree: 

 

msgdma_0: msgdma@0x100000020 { 

compatible = "altr,msgdma-14.1", "altr,msgdma-1.0"; 

reg = <0x00000001 0x00000020 0x00000020>, 

<0x00000001 0x00000010 0x00000010>; 

reg-names = "csr", "descriptor_slave"; 

interrupt-parent = <&hps_0_arm_gic_0>; 

interrupts = <0 41 4>; 

clocks = <&clk_0>; 

}; //end msgdma@0x100000020 (msgdma_0) 

 

Now the interesting part of all this is that there are no drivers for altr,msgdma-1.0 available in the 3.10-ltsi. While direct memory mapping will probably work to write to the descriptor, has anyone had any luck with writing mSGDMA descriptors from the HPS? It looks like people have been able to use the Altera DMA Controller, no problem, but the mSGDMA provides a wider bus width and more configurable options that could be highly beneficial for an FPGA-to-HPS bridge. 

 

Thanks!
0 Kudos
27 Replies
Altera_Forum
Honored Contributor I
406 Views

Hi Derim. I am not sure if this example design from Rocketboards (http://rocketboards.org/foswiki/view/documentation/datamoverdesignexample) would convey you the info that you need. But worth having a look :)

Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

Hi Derim. I am not sure if this example design from Rocketboards (http://rocketboards.org/foswiki/view/documentation/datamoverdesignexample) would convey you the info that you need. But worth having a look :) 

--- Quote End ---  

 

 

Thanks-- I think I'm going to debug this a bit myself a bit more as I'm thinking that there's either an endianess or timing issue with the HPS driving the mSGDMA directly when mapping /dev/mem. It would be great to know if anyone else has tried it as the examples all rely on a Nios core, which should be redundant for an HPS-based design. Depending, of course, on one's application.
Altera_Forum
Honored Contributor I
406 Views

I do control mSGDMA with lightweight AXI 

In QSys: 

CSR, Descriptor, Response connected to lw_axi_master 

mm_write connected to f2h_axi_slave 

In Linux: 

open /dev/mem 

mmap to 0xFF200000 with offset of registers addresses 

# pragma pack(push,1) /** * @brief mSGDMA control and status register */ typedef struct { uint32_t status; uint32_t control; uint16_t rd_fill_level; uint16_t wr_fill_level; uint16_t resp_fill_level; uint16_t reserved_0; uint16_t rd_sequence_number; uint16_t wr_sequence_number; uint32_t reserved_1; uint32_t reserved_2; uint32_t reserved_3; } t_mSGDMA_CSR; # pragma pack(pop) t_mSGDMA_CSR * regs_CSR; fd_FPGA_ctrl_regs = open("/dev/mem", O_RDWR|O_SYNC); mem_FPGA_ctrl_regs = (unsigned int*)mmap(NULL, MAP_PIO_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd_FPGA_ctrl_regs, 0xFF200000); regs_CSR= (t_mSGDMA_CSR*)(p_mmap_mSGDMA_control + mSGDMA_CSR_offset / sizeof(uint32_t)); /* descriptor, response is the same */ /* and registers are ready to use */
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

I do control mSGDMA with lightweight AXI 

In QSys: 

CSR, Descriptor, Response connected to lw_axi_master 

mm_write connected to f2h_axi_slave 

In Linux: 

open /dev/mem 

mmap to 0xFF200000 with offset of registers addresses 

--- Quote End ---  

 

 

 

Great. See, that makes sense. f2h_axi_slave, or f2h_sdram? Where are you pushing data?
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

Great. See, that makes sense. f2h_axi_slave, or f2h_sdram? Where are you pushing data? 

--- Quote End ---  

 

I use f2h_axi_slave to put data from mSGDMA to SDRAM
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

I use f2h_axi_slave to put data from mSGDMA to SDRAM 

--- Quote End ---  

 

 

See, that's fascinating. So instead of using the SDRAM AXI you're using the f2h_axi_slave. I'll give it a try.
Altera_Forum
Honored Contributor I
406 Views

So the mSGDMA is working great, but now it almost looks like I'm seeing endianess issues when reading back the data in to the HPS. Has anyone seen this? In SignalTap the signals are definitely correct, but when reading the values back in the HPS, it looks like the endianess is flipped. 

 

As the Nios and HPS are little-endian, does the mSGDMA do any sort of internal conversion to handle this?
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

So the mSGDMA is working great, but now it almost looks like I'm seeing endianess issues when reading back the data in to the HPS. Has anyone seen this? In SignalTap the signals are definitely correct, but when reading the values back in the HPS, it looks like the endianess is flipped. 

 

As the Nios and HPS are little-endian, does the mSGDMA do any sort of internal conversion to handle this? 

--- Quote End ---  

 

 

And on a related note, how does the prefetcher handle fetching descriptors when the descriptor interface width is less than the full descriptor width? Does it fetch from LSB to MSB, or something else? If it fetches LSB to MSB, this will cause problems as the "GO" bit is in the MSB, which will then trigger a start without setting any of the interrupt enables.
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

And on a related note, how does the prefetcher handle fetching descriptors when the descriptor interface width is less than the full descriptor width? Does it fetch from LSB to MSB, or something else? If it fetches LSB to MSB, this will cause problems as the "GO" bit is in the MSB, which will then trigger a start without setting any of the interrupt enables. 

--- Quote End ---  

 

 

 

Up to the point now where the SDR ports are out of reset, and I am seeing data being streamed from the DMA in to RAM. The data ordering is odd, but I'm trying to do 128-bit transactions. I don't see why this would be an issue, but looking in to it. 

 

EDIT: Spoke too soon. I'm able to get it to work once in awhile, but not consistently as expected. My procedure: 

 

> Write updated descriptors in to descriptor memory 

> Check for prefetcher to be stopped 

> Trigger "run" and "global_int" bits of prefetcher 

> Wait for run to end 

> Wait for interrupt 

 

It seems to sometimes work, sometimes not. Quite frustrating. The first run in a set seems to be missed, and unless I do things in a very specific way, no luck. It also seems to not be clearing the interrupt on the last run or clearing the interrupt takes longer than it should.
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

Up to the point now where the SDR ports are out of reset, and I am seeing data being streamed from the DMA in to RAM. The data ordering is odd, but I'm trying to do 128-bit transactions. I don't see why this would be an issue, but looking in to it. 

--- Quote End ---  

 

 

For any one else having issues, I was running in to an issue where the Qsys interconnect (or something on the Descriptor-write side) was causing problems with triggering the DMA, leading the DMA to act very odd. 

 

Even so, I need to add a <100 ms delay after initiating and completing the first mSGDMA run before I can start using the DMA correctly. I'm wondering if there is some sort of initialization procedure that isn't covered by a reset that is not being accounted for?
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

For any one else having issues, I was running in to an issue where the Qsys interconnect (or something on the Descriptor-write side) was causing problems with triggering the DMA, leading the DMA to act very odd. 

 

Even so, I need to add a <100 ms delay after initiating and completing the first mSGDMA run before I can start using the DMA correctly. I'm wondering if there is some sort of initialization procedure that isn't covered by a reset that is not being accounted for? 

--- Quote End ---  

 

 

Has anyone been able to use the mSGDMA in park mode for an ST-to-MM transfer, and then reset the mSGDMA? I'm finding that I can't reset the DMA once I have started park mode, with the CSR getting stuck at a value of 0x5 (reset, run). 

 

In addition, it looks like the mSGDMA only writes the Owned-by-HW bit during descriptor write back, meaning that having a linked list set up to point back at itself will fill the descriptor FIFO on the initial fetch. I might be wrong about this as well, but the documentation is very unclear. 

 

BadOmen?
Altera_Forum
Honored Contributor I
406 Views

derim - 

 

I want to thank you for continuing to post your mSGDMA progress here even though you're getting very little if any help or feedback. I'm in the process of folding the mSGDMAs into our A10 SoC dev kit design to DMA between HPS memory and FPGA memory. I'll share any problems and/or solutions we find. 

 

Bob
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

derim - 

 

I want to thank you for continuing to post your mSGDMA progress here even though you're getting very little if any help or feedback. I'm in the process of folding the mSGDMAs into our A10 SoC dev kit design to DMA between HPS memory and FPGA memory. I'll share any problems and/or solutions we find. 

 

Bob 

--- Quote End ---  

 

 

As we're running on custom hardware, I've also contacted my FAE and Altera directly through support. It's very frustrating as nothing seems to be performing in a deterministic/expected manner. 

 

On a related note, it is fascinating, but I can't figure out how to get an mSGDMA out of park mode. I'm obviously doing something in the wrong order, but nothing in the documentation is providing strict guidance. 

 

It also is odd that the first capture that I do fails, but the second always works. I'm thinking there is something distinctly wrong here. It might make sense to switch to the Altera scatter-gather from the mSGDMA if this continues. Considering the mSGDMA is used in all the Altera components (TSE), something else is probably going on.
Altera_Forum
Honored Contributor I
406 Views

We borrowed a Cyclone V dev kit from our local Altera account manager and ran this reference design on that board successfully: 

 

https://releases.rocketboards.org/release/2014.05/tse-ed/hw 

 

As you mentioned, this design uses mSGDMAs between HPS memory and the TSE. Since the Linux drivers were happy with the configuration of those mSGDMAs that became my template for our Arria 10 SoC design. Maybe you can figure something out by inspection. 

 

We should be testing our design next week and I'll let you know what we see. 

 

Bob
Altera_Forum
Honored Contributor I
406 Views

Bob, 

 

I don't think I mentioned, but my mSGDMA hooked up to SDRAM works near flawlessly, except for an issue where the first 2046 bytes out are zeroes. After that it works great, I believe, but am not sure if I'm seeing discontinuous points at the boundary on starting a run. This is the data path fifo depth, so it's highly suspicious behavior. I'm also not using packetized data. 

 

The ones that have trouble are tied to onchil memory tied to the fast bus. 

 

https://github.com/altera-opensource/linux-socfpga/blob/a24e3d414e59ac76566dedcad1ed1d319a93ec14/dri... 

 

Thanks for the thought-- but no exact luck on this. The only difference might be their clearing of the status register before resetting the mSGDMA, but they are also not using a prefetcher. The DMAs that I'm working with require the ability to go in to park mode for continuous data transfer/interrupts, or at least a good facsimile thereof. 

 

I'll definitely try to clear the status bits first and after, though. And switch to unaligned accesses per the rocketboards recommendations. 

 

Cheers, 

Josh
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

I do control mSGDMA with lightweight AXI 

In QSys: 

CSR, Descriptor, Response connected to lw_axi_master 

mm_write connected to f2h_axi_slave 

In Linux: 

open /dev/mem 

mmap to 0xFF200000 with offset of registers addresses 

# pragma pack(push,1) /** * @brief mSGDMA control and status register */ typedef struct { uint32_t status; uint32_t control; uint16_t rd_fill_level; uint16_t wr_fill_level; uint16_t resp_fill_level; uint16_t reserved_0; uint16_t rd_sequence_number; uint16_t wr_sequence_number; uint32_t reserved_1; uint32_t reserved_2; uint32_t reserved_3; } t_mSGDMA_CSR; # pragma pack(pop) t_mSGDMA_CSR * regs_CSR; fd_FPGA_ctrl_regs = open("/dev/mem", O_RDWR|O_SYNC); mem_FPGA_ctrl_regs = (unsigned int*)mmap(NULL, MAP_PIO_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd_FPGA_ctrl_regs, 0xFF200000); regs_CSR= (t_mSGDMA_CSR*)(p_mmap_mSGDMA_control + mSGDMA_CSR_offset / sizeof(uint32_t)); /* descriptor, response is the same */ /* and registers are ready to use */ 

--- Quote End ---  

 

 

What value is MAP_PIO_SIZE?
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

What value is MAP_PIO_SIZE? 

--- Quote End ---  

 

 

Here's where things stand now: 

 

- I can use them fine, but I am unable to cancel/reset an mSGDMA when using park mode. I think, to get around this, those mSGDMA modules will be started on startup and not touched after. 

- Altera informed me that there was a synthesis bug that sometimes showed up related to descriptor writes in Quartus before 15.1.1, so I might have been seeing issues with that. Using the pre-fetcher works well for me, so I might try a no-prefetcher design again later. 

- The mSGDMA module has a FIFO that it uses for the data path. This FIFO has a READY line out that ONLY signals that the FIFO is not full. This means that if doing an Avalon-ST to Avalon-MM DMA transaction the design must be very careful to NOT fill the FIFO while the DMA is not running, or you will see corrupted data. The fix for this is to have a module that only sends the number of valids that should go in to each DMA transaction each time. 

 

Example: Start a single-shot of the DMA for n transactions, then start the FPGA-based streamer to send n transactions. 

Example 2: Turn on DMA for park mode, then turn on transaction streamer. Stopping park should involve deasserting the bit, waiting for the Run to end, and then turning off the streamer before resetting the DMA. I have not been able to get this to work yet and instead the DMA gets stuck in reset. 

 

It would be really nice if the DMA module "ready" was actually a "DMA ready to receive," but it is not.
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

What value is MAP_PIO_SIZE? 

--- Quote End ---  

 

 

Also, in reply to this one-- I updated all my memory mappings and sizes recently. I wouldn't call it MAP_PIO_SIZE, but instead LW_REGS_SIZE, which is 2 MB. Check out page 1-19 and 7-6 of the HPS Technical Reference Manual. The LW FPGA devices run from 0xFF200000 to 0xFF400000, or 0x200000. This gets mapped to virtual memory in Linux, and there is also the fast bridge which runs from 0xC0000000 and is 960 MB, or to 0xFC000000 (0x3C000000 size). 

 

The graphic on 7-6 saved me a lot of time.
Altera_Forum
Honored Contributor I
406 Views

 

--- Quote Start ---  

Also, in reply to this one-- I updated all my memory mappings and sizes recently. I wouldn't call it MAP_PIO_SIZE, but instead LW_REGS_SIZE, which is 2 MB. Check out page 1-19 and 7-6 of the HPS Technical Reference Manual. The LW FPGA devices run from 0xFF200000 to 0xFF400000, or 0x200000. This gets mapped to virtual memory in Linux, and there is also the fast bridge which runs from 0xC0000000 and is 960 MB, or to 0xFC000000 (0x3C000000 size). 

 

The graphic on 7-6 saved me a lot of time. 

--- Quote End ---  

 

 

Everything works, except I can't turn off park mode on the mSGDMA module without freezing Linux. I think that the DMAs really would benefit from a generic UIO kernel driver being available to handle things such as resets, etc. Waiting to hear back from Altera about the best method to reset a module in park mode.
Altera_Forum
Honored Contributor I
152 Views

 

--- Quote Start ---  

What value is MAP_PIO_SIZE? 

--- Quote End ---  

 

 

# define MAP_PIO_SIZE (0x00100000) 

In my case I dedicate 1 MB of top of memory to FPGA. To dedicate it to FPGA I start kernel with mem option  

setenv net_nfs run netload nfsargs addip addargs \; bridge enable \; bootm ${kernel_addr_r} - ${fdt_addr} mem=0x3FF00000 

 

where 0x3FF00000 is 2 GB ( RAM installed on the board ) - 1 MB (dedicated to FPGA)
Reply