Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

SGDMA debug

Altera_Forum
Honored Contributor II
4,096 Views

Hello, 

I am trying to learn how to use SGDMA to transfer data to memory. First of all, I am trying to connect SGDMA TX directly to SGDMA RX and check results. You can find the source code ->here<- (http://www.codeupload.com/4051) (I've edited the code found on forum, so thanks the guy who provided it). 

In order to check what is written to the memory, I've connected the SGDMA modules to ONCHIP memory (size: 32768 bytes) and enabled memory content reader on the on-chip memory options. You can see my connections ->here<- (http://i51.tinypic.com/1zz6fj5.png). 

The problem is that this dma design doesn't work. The data I get is corrupted. 

I've tried to disconnect cpu.instruction_master and cpu.data_master from onchip memory, then the memory start address goes to 0x00, but results are the same. 

 

If I connect the DMA to the DDR memory, then the data passes correctly, Screenshot is ->here<- (http://i53.tinypic.com/2heak61.png). 

 

Now why I get this situation: 

either I try to read or write from/to onchip memory or DDR memory, the core always tries to read/write from/to address 0x800xxxxx. Obviously when it is connected to DDR ram it works, but when to onchip ram it doesn't. Why is this happening? Nios memory bus bugs? How can I check what is written to the memory? If I use only onchip memory for the whole system and DMA it also works fine, but I want a clear vision without Nios stuff in the memory. 

 

Thanks.
0 Kudos
55 Replies
Altera_Forum
Honored Contributor II
934 Views

I don't see anything wrong so I recommend simulating this system so that you can watch the read and write masters to see what they are doing. You should only have to simulate around 300us worth of time. 

 

Also when you compile the code how much room is being reported for the stack and heap? You said your memory is 32kB so I'm thinking you could be running out of memory. What you could try doing is putting just the heap into the on-chip memory and all the other code sections in SDRAM and see if the problem goes away (that would suggest that in your test case the stack and heap collided).
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

 

--- Quote Start ---  

I don't see anything wrong so I recommend simulating this system so that you can watch the read and write masters to see what they are doing. You should only have to simulate around 300us worth of time. 

--- Quote End ---  

 

Oh.. Will take some time, but I will try :) 

 

 

--- Quote Start ---  

 

Also when you compile the code how much room is being reported for the stack and heap? You said your memory is 32kB so I'm thinking you could be running out of memory. What you could try doing is putting just the heap into the on-chip memory and all the other code sections in SDRAM and see if the problem goes away (that would suggest that in your test case the stack and heap collided). 

--- Quote End ---  

 

 

The reported size is also about 32kB, but I've left the system vectors @ DDR RAM, only SGDMA is connected to the onchip-mem, so I suppose the whole system runs from DDR memory, but DMA stuff should go through onchip-mem? I've tried to connect only descriptors to onchip-mem, but that did not help either. How could I check the memory contents without simulation? Or maybe I should use fixed addresses and provide the address for SGDMA?
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

The code that you based your design on allocates memory buffers and SGDMA descriptors in the heap. So not only does the processor need to be connected to the same memory as the SGDMA but you need to make sure your place the heap section in the memory you want the transfers and descriptors to be placed. 

 

So if you hook up the Nios II data master to the on-chip memory, place the heap section into the on-chip memory then the transfer should work fine. You would then place .text, .rodata, .rwdata, stack in the DDR SDRAM. 

 

Alternatively you could hack up the code and placed the data buffers and descriptors anywhere you want and remove the malloc() calls that are doing this today (malloc() allocates memory from the heap). 

 

So I would try the system/software changes first before running the simulation, I think this is just a matter of making sure the CPU and SGDMA have the right visibility to the shared memory.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Oh, I will have to find out how to prepare memory without malloc() then :) Well, time to learn C anyway.  

 

Thanks for Your help.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Are you also allowing for any cpu cache? 

The DMA won't snoop the cache.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

 

--- Quote Start ---  

Are you also allowing for any cpu cache? 

The DMA won't snoop the cache. 

--- Quote End ---  

 

The system is left as default when added, so yes: Instruction cache = 4kbytes, Data cache = 2kbytes, data cache line size=32bytes. 

 

 

--- Quote Start ---  

------------------- 

--- Quote End ---  

 

 

Ok, so I've placed heap into onchip memory and it works now, I can see contents in the memory content editor. Contents a bit strange, I've not done any investigations on that. 

I've found more interesting thing (->output log here<- (http://www.codeupload.com/4059)). 

I create random 32bit pattern data using rand()&0xFFFFFFFF. This creates correct random data as it is seen in log lines 6 to 70. 

Now when I try to read the data, it is read the same for 4 times: lines 117-120, 121-124, 125-128 etc. 

Why is this happening? Wrong memory address increase step?
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Assuming the code hasn't changed much there is a flush after the buffers are populated.  

 

The code previously populated the buffer 8 bits at a time (I think) so when you modified the code did you make sure to take that into account when populating 32 bit data?
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Well, I've changed the source to populate 32bit data. E.g. all pointers use 32bit allocation - check create_test_data(), however I am not sure if I am doing it correctly. In the validate part, I am reading the data using iord_32direct() instead of iord_8direct(). 

 

You can check the current source ->here<- (http://www.codeupload.com/4061). 

 

Test data creation is between lines 131-155 and I am reading the memory in lines 205-206. Maybe I need to use a higher offset when reading with iord_32direct() instead of iord_8direct? 

 

----  

EDIT: 

right, so I suppose the problem was 8bit read/write instead of moving the cycle every 4 bytes to work with 32bit data and iord_32direct(). 

---- 

 

Now my task is to get 32bit packets from my packet source (I have Avalon-ST component, which takes packet data from logic and moves to Avalon-ST) and place it in Nios. I suppose I should use SGDMA stream-to-memory, then read the data using iord_direct32() in Nios. I will fill the TSE buffer in function ip_send() with that data and send it over ethernet. 

If I understart correctly, I need to do sync transfer in SGDMA, because my component generates packet start and packet stop signals. Maybe there is an example of such thing, when only Avalon-ST source and SGDMA RX are used? 

 

P.S. The data is video stream, I have to use something faster than simple DMA + I want to learn to use SGDMA, since I will have to receive the same data on the other end and parse it back to logic.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Besides ethernet I don't know of a good ST-->MM SGDMA example. I have one for MM-->ST but that uses my own SGDMA up on the alterawiki. If you want to see that video design search for the "Modular SGDMA" and a link to a video design should appear in the results. The software API for the mSGDMA is much different than the one you are using (mine is much simplier since it doesn't rely on descriptor fetching).

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Hm, which example exactly is MM->ST? I've found mSGDMA example on the Wiki, but that one had ST->MM->ST with timing measurement. Too bad there are no examples on simple SGDMA :| That descriptor allocation and set up stuff is quite hard to understand and fulfill.

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

This is the only one I know of: http://www.alterawiki.com/wiki/modular_sgdma_video_frame_buffer 

 

That's my mSGDMA though. The mSGDMA design example is configured for MM-->MM but it supports MM-->ST and ST-->MM as well just like the SGDMA. 

 

Complicated descriptor APIs is the price you pay for having the ability to have the SGDMA pre-fetch descriptors. Most don't need that pre-fetching so I excluded it in my implementation and made it a feature on my todo list instead. If you do switch DMAs I recommend sticking to the SGDMA installed with Quartus for ethernet stuff since I haven't heard of anyone bolting my hardware up to the nichestack.... yet.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Yeah, I've seen here in forums that mSGDMA is Your creation. Great job! I think I will have to dig the INiche source or other already written source, since it is really hard to understand how to make it run from a scratch.

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Ok I am moving forward. 

I've written a code, which successfully takes data from Avalon-ST and writes to the memory. The code is available ->here<- (http://www.codeupload.com/4087). 

Now when I check the memory contents, I see data written after SOP(Start Of Packet) until EOP(End Of Packet). So SGDMA generates interrupt, status says that EOP is found, descriptor completed and chain is completed. 

To continue the operation, I thought I had to write do_async_transfer again into SGDMA transfer complete interrupt routine, but that's not working. I suppose I need to set up descriptors again? 

 

How can I make the stream-to-memory transfer always continuous? Now it is done once and thats it.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

The easiest way would be to have a second descriptor chain ready to go so that when the interrupt fires from the completion of the first chain you can quickly write the start of the second chain. Unfortunately this means you will have a little bit of dead time between the chains. 

 

Another way to do this is to start building up the second chain which is linked to the first one but make sure the owned by hardware bit isn't set on the beginning of the second chain. Then it becomes a matter of flipping that bit and starting the SGDMA back up. Again this will lead to some dead time between descriptors. Last but not least you could use a polling approach to figure out where the SGDMA is in the chain. Using this same owned by hardware bit flipping approach if you can create the second descriptor chain and flip that bit fast enough then there will be no dead time between the descriptor chains. That said you have to worry about race conditions since you'll have two masters (CPU and SGDMA) accessing the same memory location. 

 

This trickiness is why I made it possible to fire an interrupt on the completion of any descriptor in my implementation.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Take a look at one of the examples on the NEEK board (like the C2H Mandlebrot design). The video support files use the park mode to ensure the owned by hardware bits in each descriptor remain set so that you can loop a descriptor chain back on itself. This might do what you are looking for.... again watch out for race conditions when doing this.

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

 

--- Quote Start ---  

The easiest way would be to have a second descriptor chain ready to go so that when the interrupt fires from the completion of the first chain you can quickly write the start of the second chain. Unfortunately this means you will have a little bit of dead time between the chains. 

 

Another way to do this is to start building up the second chain which is linked to the first one but make sure the owned by hardware bit isn't set on the beginning of the second chain. Then it becomes a matter of flipping that bit and starting the SGDMA back up. Again this will lead to some dead time between descriptors. Last but not least you could use a polling approach to figure out where the SGDMA is in the chain. Using this same owned by hardware bit flipping approach if you can create the second descriptor chain and flip that bit fast enough then there will be no dead time between the descriptor chains. That said you have to worry about race conditions since you'll have two masters (CPU and SGDMA) accessing the same memory location. 

 

This trickiness is why I made it possible to fire an interrupt on the completion of any descriptor in my implementation. 

--- Quote End ---  

 

 

Hm, well it gives me status result of 0x8e, which says that the chain is complete. Is there a method to start the same chain again? I suppose I have two descriptors prepared in my current chain. 

I can wait a small period of time since I have to receive a packet and place it to InterNiche packet buffer, which is a small struct with some additional variables.
0 Kudos
Altera_Forum
Honored Contributor II
934 Views

In order to re-use the same descriptor chain you'll need to go through the chain and set the owned by hardware bit back on (SGDMA turns those off to let you know it has worked on them). Then you will start up the SGDMA the same way you did the last time.

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

So basically all the allocation procedures & descriptor set up again? Anyway, lets clarify this: in order to continue sgdma to work, I have to do all the set up in sgdma complete interrupt, so it would drop again in the same interrupt and I would set up it again, etc etc forever...?

0 Kudos
Altera_Forum
Honored Contributor II
934 Views

Pretty much. The reason why the SGDMA didn't work on the second time was the the owned by hardware bits were low. So you could reuse those descriptors by just flipping those bits back to being high (like they were for the first time those descriptors were populated in RAM)

0 Kudos
Altera_Forum
Honored Contributor II
829 Views

Ok, thanks, I will try that out.

0 Kudos
Reply