Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Valued Contributor III
2,520 Views

Stream to Memory sgdma problem

I am having two problems with the sg-dma in my system. My first issue is that the sg-dma ready signal goes low after the first element of streaming data, then returns high on the third. This causes the second element to be skipped. I am streaming data at the same rate as the sg-dma clock. 

 

My second issue is that I can not re-use the descriptors. I try to set the hardware owned bit to 1 by the following command: DMA_desc[0].control=128; 

The sg-dma ready signal goes high when I call the do_async_transfer() function, but it toggles low when it gets streaming data. No data is transferred to memory. 

 

Can anyone help?? 

 

Thanks.
0 Kudos
22 Replies
Highlighted
Valued Contributor III
31 Views

 

--- Quote Start ---  

I am having two problems with the sg-dma in my system. My first issue is that the sg-dma ready signal goes low after the first element of streaming data, then returns high on the third. This causes the second element to be skipped. I am streaming data at the same rate as the sg-dma clock. 

 

--- Quote End ---  

 

 

It sounds like the components that are the source of the stream aren't prepared to accept the backpressure from the SGDMA. It's only going to get worse if the SGDMA encounters any contention or other delays (e.g. SDRAM refresh) when it's trying to burst to the memory.  

 

One workaround is you can place an Avalon-ST FIFO right before the SGDMA; the FIFO won't deassert 'ready' until it is nearly full.
0 Kudos
Highlighted
Valued Contributor III
31 Views

The solution to the ready signal going low after the first piece of data was to put the sgdma in burst mode. 

 

The solution to the second problem was to build the descriptors again. 

 

My new problem is that the descriptors are not built in a timely fashion for my system. I would like to use the PARK option in the control register map, so I can re-use the descriptor without rebuilding it. I have not had any success programming this configuration. Does anyone have some CODE that can show a working system that re-uses the descriptors? 

 

Thank you in advance for your help!
0 Kudos
Highlighted
Valued Contributor III
31 Views

Using the park mode is tricky because if you are not careful you may end up with a race condition when switching to another descriptor list. I don't have an example of it but if you look at the example designs for the NEEK that use the LCD you will see the video framebuffering getting parked so that the same frame can be re-used. Essentially the way it works is you create a linked list that loops back on itself and by setting the park bit the SGDMA doesn't flip the owned by hardware bit low when it consumes the descriptors. Due to that and the descriptor linked list loopback the SGDMA hits the end of the list, loops back to the start, and reuses the descriptors. 

 

If you want to see a simpler implementation take a look at the DMA engine that I posted to the wiki: http://www.alterawiki.com/wiki/modular_sgdma Parking is a simple as turning on a bit in a descriptor so that the engine keeps using it until you provide a new descriptor. You also don't need to chain a bunch of descriptors together to get a large transfer size, this DMA engine can move up to 4GB if you setup the hardware accordingly. An example of how to use the feature for video buffering purposes is shown here: http://www.alterawiki.com/wiki/modular_sgdma_video_frame_buffer
0 Kudos
Highlighted
Valued Contributor III
31 Views

I have some working code that continuously takes in serial data to on-chip ram from 2 stream-to-memory sgdma's, and puts the serial data in a circular buffer in ddr memory.  

 

The code puts the two stream-to-memory sgdma's in park mode, and the descriptor points back to itself for continuous operation. A call back is registered for the completion of each stream-to-memory descriptor to start the data transfer to the ddr3 circular buffer. The ddr3 transfer is done with a memory-to-memory sgdma with and single descriptor chain. The descriptor chain is rebuilt after each transfer. 

 

Comments are more than welcome... 

 

 

// CWstream1_DMA SGDMA callback function 

// * After each descriptor is processed (1600 bytes), the data is transfered to DDR3 memory in a circular buffer 

void CWstream1_callback_function(void * context) 

// tx tells the circular buffer which sgdma was processed 

tx=0; 

alt_avalon_sgdma_do_async_transfer(DDR3_DMA, &DDR3_DMA_desc[0]); 

 

// CWstream2_DMA SGDMA callback function 

// * After each descriptor is processed (1600 bytes), the data is transfered to DDR3 memory in a circular buffer 

void CWstream2_callback_function(void * context) 

IOWR_ALTERA_AVALON_PIO_DATA(LED_PIO_BASE, 1); 

// tx tells the circular buffer which sgdma was processed 

tx=1; 

alt_avalon_sgdma_do_async_transfer(DDR3_DMA, &DDR3_DMA_desc[0]); 

 

// DDR3_DMA callback function 

// * Re-initialize the descriptor for the next memory transfer 

// * Check the memory position to see if the circular buffer needs to start over 

void DDR3_callback_function(void * context) 

// update the write address of the circular buffer 

write_addr = write_addr + 0x640; 

if (write_addr >= 0x3e80) 

DDR3_write_addr = DDR3_BASE; 

write_addr = 0; 

else 

DDR3_write_addr = DDR3_BASE + write_addr; 

 

// Check which sgdma was processed so we can know where to read the data from 

if (tx == 1) 

 

alt_avalon_sgdma_construct_mem_to_mem_desc(&DDR3_DMA_desc[0], &DDR3_DMA_desc[1], CW_DAQ1_write_addr, DDR3_write_addr, 1600, 0, 0); 

IOWR_ALTERA_AVALON_PIO_DATA(LED_PIO_BASE, 3); 

IOWR_ALTERA_AVALON_PIO_DATA(LED_PIO_BASE, 4); 

else 

alt_avalon_sgdma_construct_mem_to_mem_desc(&DDR3_DMA_desc[0], &DDR3_DMA_desc[1], CW_DAQ2_write_addr, DDR3_write_addr, 1600, 0, 0); 

IOWR_ALTERA_AVALON_PIO_DATA(LED_PIO_BASE, 2); 

DDR3_DMA_desc[0].control=128; 

DDR3_DMA_desc[1].control=0; 

 

 

// Main code entry 

// * Write a message to the LCD display for fun 

// * Initialize pushbutton callbacks 

// * Clear memory space 

// * Start DMA engines 

// * Sit in a while loop and wait for callbacks 

int main(void) 

 

// Initialize circular DDR3 transfer variables 

write_addr = 0; 

 

// This is a software reset to the ping-pong selector hardware counter. 

// Pull this low and then return to the high state will reset the counter to 0. 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_PROGRAM_BASE, 0); 

 

// Reset the sgdma controller 

// This is a software reset, - not sure if needed 

// The documentation states this is a last resort control, so we should remove this if we can. 

// Embedded Peripherals IP User Guide, pg 25-12 

IOWR_32DIRECT(SGDMA_ST1_BASE, 16, 0x10000); 

IOWR_32DIRECT(SGDMA_ST2_BASE, 16, 0x10000); 

usleep(1000); 

IOWR_32DIRECT(SGDMA_ST1_BASE, 16, 0x10000); 

IOWR_32DIRECT(SGDMA_ST2_BASE, 16, 0x10000); 

usleep(1000); 

 

// Open the CW streaming scatter-gather DMA controllers 

CWstream_DMA1 = alt_avalon_sgdma_open("/dev/sgdma_st1"); 

if(CWstream_DMA1 == NULL) 

printf("Could not open the CW SG-DMA1\n"); 

 

CWstream_DMA2 = alt_avalon_sgdma_open("/dev/sgdma_st2"); 

if(CWstream_DMA2 == NULL) 

printf("Could not open the CW SG-DMA2\n"); 

 

// Open the Memory transfer to circular buffer scatter-gather DMA controller 

DDR3_DMA = alt_avalon_sgdma_open("/dev/sgdma_ddr3"); 

if(DDR3_DMA == NULL) 

printf("Could not open the DDR3 SG-DMA\n"); 

 

// Set the write addresses for data transfers 

CW_DAQ1_write_addr = (alt_u32 *)(DAQ1_MEM_BASE); 

CW_DAQ2_write_addr = (alt_u32 *)(DAQ2_MEM_BASE); 

DDR3_write_addr = (alt_u32 *)(DDR3_BASE); 

 

// Set up the CW stream 1 descriptor 

// The descriptor points back to itself for continuous operation 

// The PARK bit of the control register must be set in order to re-use the descriptor (done in the callback registration)r 

alt_avalon_sgdma_construct_stream_to_mem_desc(&CWstream_DMA1_desc[0], &CWstream_DMA1_desc[0], CW_DAQ1_write_addr, 1600, 0); 

// Set the OWNED_BY_HW bit to 1 on desc[0] 

CWstream_DMA1_desc[0].control=128; 

 

// Set up the CW stream 2 descriptor 

alt_avalon_sgdma_construct_stream_to_mem_desc(&CWstream_DMA2_desc[0], &CWstream_DMA2_desc[0], CW_DAQ2_write_addr, 1600, 0); 

CWstream_DMA2_desc[0].control=128; 

 

// Set up DDR3 1 descriptor 

alt_avalon_sgdma_construct_mem_to_mem_desc(&DDR3_DMA_desc[0], &DDR3_DMA_desc[1], CW_DAQ1_write_addr, DDR3_write_addr, 1600, 0, 0); 

// Set the OWNED_BY_HW bit to 1 on desc[0], and set the bit to 0 on desc[1] 

// This stops the descriptor chain at desc[1] and flags an interrupt for its callback function 

DDR3_DMA_desc[0].control=128; 

DDR3_DMA_desc[1].control=0; 

 

// Register callback functions 

// CWstream1 callback 

// The PARK_MASK allows prevent the OWNED_BY_HW bit from being cleared after the descriptor is processed 

alt_avalon_sgdma_register_callback(CWstream_DMA1, 

&CWstream1_callback_function, 

(ALTERA_AVALON_SGDMA_CONTROL_IE_GLOBAL_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_PARK_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_DESC_COMPLETED_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_CHAIN_COMPLETED_MSK ), 

NULL); 

 

// CWstream2 callback 

alt_avalon_sgdma_register_callback(CWstream_DMA2, 

&CWstream2_callback_function, 

(ALTERA_AVALON_SGDMA_CONTROL_IE_GLOBAL_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_PARK_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_DESC_COMPLETED_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_CHAIN_COMPLETED_MSK), 

NULL); 

 

//DDR3 callback 

alt_avalon_sgdma_register_callback(DDR3_DMA, 

&DDR3_callback_function, 

(ALTERA_AVALON_SGDMA_CONTROL_IE_GLOBAL_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_CHAIN_COMPLETED_MSK), 

NULL); 

 

usleep(1000); 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_PROGRAM_BASE, 1); 

 

// Start the CW transfers 

alt_avalon_sgdma_do_async_transfer(CWstream_DMA1, &CWstream_DMA1_desc[0]); 

alt_avalon_sgdma_do_async_transfer(CWstream_DMA2, &CWstream_DMA2_desc[0]); 

 

 

 

// Sit in a while loop while callbacks happen 

while (1) 

 

// Should not reach this code 

printf("\nExit\n\n"); 

return 0; 

}
0 Kudos
Highlighted
Valued Contributor III
31 Views

You might be exposed to a race condition, it really depends on how many descriptors are in the chain and how fast the CPU can maintain control over the SGDMA. The problem with circular buffering like this is if the processor gets bogged down, it's possible that the SGDMA ends up overwriting data that your CPU is not done using. 

 

What you describing sounds like it would be more appropriate to implement using ping pong buffering where one buffer is being filled while the other one is being consumed by the processor or whatever uses the data afterwards. With ping pong buffering you could just have two linked lists, one per buffer, and each time the SGDMA hits the end of the list you just point it over to the other list and start it back up.
0 Kudos
Highlighted
Valued Contributor III
31 Views

Thanks for your response BadOmen. 

 

I did use a ping pong approach (I think). I first stream 1600 bytes of data into onchip memory "DAQ1_MEM" with the stream-to-memory sgdma called "CWstream_DMA1", and when that completes, the callback function "CWstream1_callback_function" starts to move that data into DDR3 with the memory-to-memory sgdma called "DDR3_DMA". While the DDR3 memory-to-memory transfer is happening, the second stream-to-memory sgdma "CWstream_DMA2" starts to stream 1600 bytes of data in onchip memory "DAQ2_MEM". Again, when CWstream_DMA2 is done gathering its data, the callback function "CWstream2_callback_function" starts the memory-to-memory sgdma "DDR3_DMA" to move the data into the next location of DDR3 memory.  

 

You are correct to mention the race condition, because one has to make sure that the callbacks and transfers into DDR3 can finish before one of the streaming sgdma's are done. I found that the length of time of these callbacks and transfers is dependant on cpu speed, code memory location, and probably a bunch of other stuff. I used signal tap to look at some flags I set up (LED_PIO) in the callback functions to see when the transfers finished to make sure I didn't run into a condition where callback when happening out of order. 

 

I'm using the cyclone V dev kit and run the entire NIOS based system on a 50 MHz clock. I spread the 1600 byte stream-to-memory transfers over a 100 ns time period to give the callbacks and transfers enough time. This setup essentially allows me to sample 8 x 16 bit waveforms at 1 MHz. I think I can double the amount of waveforms if I pushed it further. 

 

Thank again for your comments BadOmen, and I appreciate any more comments or discussion.
0 Kudos
Highlighted
Valued Contributor III
31 Views

I'm wondering if it makes more sense to stream the data directly into SDRAM and avoid having the additional on-chip RAM and DMA in your design. if you are worried about overflow then just place a ST FIFO in between the streaming source and the ST->MM SGDMA. From a programming perspective that would be a lot easier to manage since it's just a single DMA to coordinate.

0 Kudos
Highlighted
Valued Contributor III
31 Views

That would be a better way to get the data, but this actually a test architecture for transfer to a SATA HD. I haven't purchased a SATA IP yet, but we would eventually like to store a day or two of all this sampled data on disk. Instead of the mem-to-mem sgdma, the transfer would be controlled by the SATA IP. I'm not sure if it's possible yet, but if the SATA writes are fast enough, it should work.

0 Kudos
Highlighted
Valued Contributor III
31 Views

Dear all, 

 

I build ST->Mem->St system by using SGDMA Stream to memory and SGDMA Memory to stream. In my nios the callback work at once. Am I do some wrong? 

 

Thanks for your help 

 

doddy 

 

# include <stdio.h> 

# include "altera_avalon_sgdma_descriptor.h" 

# include "altera_avalon_sgdma_regs.h" 

# include "altera_avalon_sgdma.h" 

# include "system.h" 

# include "alt_types.h" 

 

 

alt_sgdma_dev *sgdma_rx_dev; 

alt_sgdma_dev *sgdma_tx_dev; 

 

alt_sgdma_descriptor *desc; 

alt_sgdma_descriptor *desc_2; 

 

alt_u32 *buffer; 

 

int rx=0, tx=0, rxfnl=0; 

void sgdma_rx_isr(void * context, u_long intnum); 

void sgdma_tx_isr(void * context, u_long intnum); 

 

int main() 

desc = (alt_sgdma_descriptor *)DESCRIPTOR_MEMORY_BASE; 

 

sgdma_rx_dev = alt_avalon_sgdma_open(SGDMA_RX_NAME); 

if(!sgdma_rx_dev) 

printf(" Error opening RX SGDMA\n"); 

return -1; 

 

buffer = (alt_u32 *) (DESCRIPTOR_MEMORY_BASE); 

 

/* Reset RX-side SGDMA */ 

IOWR_ALTERA_AVALON_SGDMA_CONTROL(SGDMA_RX_BASE, 0x10000); 

usleep(1000); 

IOWR_ALTERA_AVALON_SGDMA_CONTROL(SGDMA_RX_BASE, 0x10000); 

usleep(1000); 

 

alt_avalon_sgdma_construct_stream_to_mem_desc( 

&desc[0], //&#20027;&#25551;&#36848;&#23383; 

&desc[0], //&#27425;&#25551;&#36848;&#23383; 

buffer, //&#25509;&#25910;&#22320;&#22336; 

0, //length,&#20026;0&#26102;&#24403;&#25910;&#21040;EOP&#26102;&#32467;&#26463; 

0); //write_fixed 

 

desc[0].control = 128; 

 

alt_avalon_sgdma_register_callback( 

sgdma_rx_dev, 

(alt_avalon_sgdma_callback) &sgdma_rx_isr, 

(ALTERA_AVALON_SGDMA_CONTROL_IE_GLOBAL_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_PARK_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_DESC_COMPLETED_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_CHAIN_COMPLETED_MSK ), 

0); 

 

desc_2 = (alt_sgdma_descriptor *)DESCRIPTOR_MEMORY_FNL_BASE; 

sgdma_tx_dev = alt_avalon_sgdma_open(SGDMA_TX_NAME);//"/dev/tx_sgdma"); 

if(!sgdma_tx_dev) 

printf(" Error opening TX SGDMA\n"); 

return -1; 

 

/* Reset TX-side SGDMA */ 

IOWR_ALTERA_AVALON_SGDMA_CONTROL(SGDMA_TX_BASE, 0); 

IOWR_ALTERA_AVALON_SGDMA_STATUS(SGDMA_TX_BASE, 0xFF); 

 

printf("Melewati reset register\n"); 

 

 

alt_avalon_sgdma_construct_mem_to_stream_desc( 

&desc_2[0], 

&desc_2[0], 

buffer, 

(4096), //4096 for 1024 data, 1 data = 4byte 

0, 

1, 

1, 

0); 

 

desc_2[0].control = 128; 

 

alt_avalon_sgdma_register_callback( 

sgdma_tx_dev, 

(alt_avalon_sgdma_callback) &sgdma_tx_isr, 

(ALTERA_AVALON_SGDMA_CONTROL_IE_GLOBAL_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_PARK_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_DESC_COMPLETED_MSK | 

ALTERA_AVALON_SGDMA_CONTROL_IE_CHAIN_COMPLETED_MSK ), 

0); 

 

alt_avalon_sgdma_do_async_transfer(sgdma_rx_dev, &desc[0]); 

 

printf("Wait for stream in....\n"); 

 

while(1); 

 

free(desc_2); 

 

 

void sgdma_rx_isr(void * context, u_long intnum) 

 

alt_avalon_sgdma_stop(sgdma_rx_dev); 

 

alt_avalon_sgdma_do_async_transfer(sgdma_tx_dev, &desc_2[0]); 

printf("in callback rx\n"); 

 

 

void sgdma_tx_isr(void * context, u_long intnum) 

alt_avalon_sgdma_stop(sgdma_tx_dev); 

 

alt_avalon_sgdma_do_async_transfer(sgdma_rx_dev, &desc[0]); 

 

printf("in callback tx\n"); 

 

 

and the output like this: 

 

Melewati reset register 

in callback rx 

in callback tx 

Wait for stream in....
0 Kudos
Highlighted
Valued Contributor III
31 Views

Solved by my self. Thanks all...

0 Kudos
Highlighted
Valued Contributor III
31 Views

Could you describe what was the problem and how you solved it, so that others facing the same problem can find a solution?

0 Kudos
Highlighted
Valued Contributor III
31 Views

I build some simple Stream to Memory in 3Mhz then forward to Stream out in 150Mhz, continuously... (DC-FIFO -> SGDMA -> Memory -> SGDMA) 

But, I do some debug with "printf" command and became stuck (looks in Nios Console). 

When I change debug into PIO to drive LEDs, it was runs... 

I assumed, "printf" tho show debug process in Nios console can't follow interrupt request from SGDMA (in high speed interrupt)
0 Kudos
Highlighted
Valued Contributor III
31 Views

The printf() function is probably using interruption itself to write to the JTAG UART, so you can't call it from an ISR or a DMA transfer callback function. If this is what you are doing then it could explain why the system hangs.

0 Kudos
Highlighted
Valued Contributor III
31 Views

Yes.. I'm agree with you Daixiwen.. 

Then I use LED PIO to trace the state for debugging purpose.. 

Thank you..
0 Kudos
Highlighted
Valued Contributor III
31 Views

In general I would avoid putting any time consuming code into an ISR. While the processor is in the ISR it can't handle other interrupts or other tasks so you typcially want to keep the amount of work done inside an ISR to a minimum. If you overload an ISR with work it can lead to problems that are hard to predict or reproduce for debug purposes.

0 Kudos
Highlighted
Valued Contributor III
31 Views

Hello Guys, I have a problem with SGDMA which occurred recently. I am using SGDMA Module for almost 4 years (Thanks to BadOmen)and so far it worked very well without any problems. recently, we have an application which requires higher data transfer and I used ST-MM SGDMA to write data from ADCs to SDRAM. I did a test for 25 GB of data and it is all ok except 16KB of data. looks like in that particular 16KB, 4th byte is always written zero and after that 16KB it is all ok. this only happens once during that 25GB of data. I have 3 different SGDMAs and the other 2 are ok. the Transfer size is programmed to (16777216 Bytes) and I am transferring the amount of 5191560 Bytes each time. I ma using Packet support and ready latency of 1. I know this problem is looks vague but any idea and comment on that greatly apprecited. Many thanks, 

Aidin.
0 Kudos
Highlighted
Valued Contributor III
31 Views

forgot mentioning that the data width is 8 bit.

0 Kudos
Highlighted
Valued Contributor III
31 Views

My suggestion would be to try to capture the event on SignalTap in order to gain better understanding of where the issue is. 

Tap signals on: 

- SDRAM slave port 

- SGDMA master port 

- SGDMA Avalon-ST sink (from ADC) 

 

and figure out at which stage the 00's are getting introduced. 

 

If 00's are never on the Avalon-ST and is reliably there on the SGDMA Avalon-MM Master port, then yes it sounds like a bug in the SGDMA and you would need to dig inside it a bit further.
0 Kudos
Highlighted
Valued Contributor III
31 Views

If you haven't updated the IP I highly recommend it because I fixed a bug a year or two ago which caused FIFO corruption under some configurations. I forget what configurations are affected but the symptom was that the FIFO inside one of the masters would transistion from empty to full and vice versa which would cause data to be lost or garbage to be written. 

 

The IP is available in Qsys as of 14.0 so if you are using that version I recommend using it instead. Unfortunately the driver isn't available yet so you would have to continue using the old driver. 

 

Also if you want to maximize memory bandwidth I recommend making the DMA wider and using a data format adapter to convert the 8-bit data from the ADC to a wider beat so that the DMA can use more of the memory bandwidth each clock cycle (I'm assuming your RAM isn't 8-bit)
0 Kudos