Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

DMA SPEED PROBLEM!

Altera_Forum
Honored Contributor II
2,114 Views

I have system with 3 SDRAM controllers  

 

1'st is x16 SDRAM for storing code and data (MAIN_SDRAM) 

2,3 is x8 for video memory (V1_SDARM,V2_SDRAM) 

All SDRAMs work on 100Mhz  

 

I try to copy one page from MAIN_SDRAM to V1 or V2 SDARMS using following code : 

//////////...............in void main()  

unsigned char *p1 = (unsigned char*)(SDRAM_V1_BASE); 

unsigned char *p2 = (unsigned char*)(SDRAM_V2_BASE); 

static byte array[800]; 

memcpy(p1,array,800); 

//////////............... 

I run it on Modelsim and calculate oparation speed transer  

 

Time need to transfer 800 bytes is 57810000ps 

The transfer speed = 800/(57810000*10^(-12)*2^20)= 13.19736124 MB/sec!!!!! 

 

After that i try to impement DMA (WR_DMA) for that  

and i run following code  

 

void memcpy_my(int *where,int * from,int howmurch) 

 

while ((IORD_ALTERA_AVALON_DMA_STATUS(WR_DMA_BASE) & ALTERA_AVALON_DMA_STATUS_BUSY_MSK)); 

 

IOWR_ALTERA_AVALON_DMA_CONTROL(WR_DMA_BASE,0); 

IOWR_ALTERA_AVALON_DMA_STATUS(WR_DMA_BASE, 0); 

IOWR_ALTERA_AVALON_DMA_LENGTH(WR_DMA_BASE, howmurch); 

IOWR_ALTERA_AVALON_DMA_RADDRESS(WR_DMA_BASE, from); 

IOWR_ALTERA_AVALON_DMA_WADDRESS(WR_DMA_BASE, where); 

IOWR_ALTERA_AVALON_DMA_CONTROL(WR_DMA_BASE,ALTERA_AVALON_DMA_CONTROL_GO_MSK | 

ALTERA_AVALON_DMA_CONTROL_BYTE_MSK| 

ALTERA_AVALON_DMA_CONTROL_WEEN_MSK| 

ALTERA_AVALON_DMA_CONTROL_LEEN_MSK); 

while ((IORD_ALTERA_AVALON_DMA_STATUS(WR_DMA_BASE) & ALTERA_AVALON_DMA_STATUS_BUSY_MSK));# ifdef DEBUG 

printf("\nDMA packet\nfrom 0x%x\nwhere - 0x%x\nhowmurch - %d",from,where,howmurch); 

short status=0;# endif 

 

 

//////////...............in main() 

unsigned char *p1 = (unsigned char*)(SDRAM_V1_BASE); 

unsigned char *p2 = (unsigned char*)(SDRAM_V2_BASE); 

static byte array[800]; 

my_memcpy(p1,array,800); 

//////////............... 

 

Time need to transfer 800 bytes is 22450000ps 

The transfer speed 800/(22450000*10^(-12)*2^20)= 33.98394001 mb/sec !!!!!!!! 

Any way it is very low value corresponds to random modes of SDRAM 

 

How i understand DMA just not use burst mode of SDRAM ! http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/mad.gif  

Is it possible to make something with DMA or processor to FIX that ? 

(in fact i don&#39;t want use DMA for that )
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
459 Views

The DMA transfer does go faster than the Nios2 when transferring into the 16-bit code/data memory, but because it has to compete with the Nios2 instruction and data masters, throughput isn&#39;t what you expect. 

 

Some ideas to improve throughput: 

 

1) Increase your Nios2 instruction cache size 

2) Define an onchip memory, put the source array in it 

3) Connect the DMA write master _only_ to the 8-bit destination memory or memories, if possible. 

4) On the "Advanced" tab of the DMA GUI, check "byte transfers" only. 

 

I don&#39;t think you&#39;ll be able to achieve a much higher transfer rate using the Nios2 alone (no DMA). 

 

By the way, there&#39;s no need to set the DMA WEEN bit, at least for this example.
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

One other idea, if you must share Nios II program/data memory with your DMA source: In SOPC Builder go to view --> "arbitration priorities". Now, look at the integer for the connection between DMA and SDRAM. Increase the number >> than the CPU. For example, if the CPU is set to 1, set the DMA master to 8 or 16 (or higher). 

 

What this does is tell the SOPC Builder arbitration logic, during contention between CPU and DMA, to allow 8/16 transfers to/from the DMA for every single transfer from the CPU. This should grealy improve throughput during times of contention, as the SDRAM won&#39;t be "thrashing" around constantly going from one bank to another. Altera app note 184 (now somewhat dated) discusses this briefly; some more comprehensive documentation on this will be put up on the Altera web site soon.
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Yes i understand that ideas. 

And I think all of them can&#39;t speed up transfer murch more higer than 33 mb/sec  

Because DMA don&#39;t use burst mode but processor can use it (i saw in timing diagram) 

In burst mode x8 SDRAM transfer will be 70-80 mb/sec  

 

Is it possible to enable burst mode during DMA transfer ?
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Alex, 

 

What I am suggesting with arbitration priorities will give you as good as performance as can be had with a burst, provided that the arbitration &#39;share&#39; you enter into SOPC Builder is your burst size... if you assign 100 to the DMA and 1 to the CPU, and the DMA then attempts to transfer 100 (or more) words, it will get un-interrupted access to SDRAM for 100 accesses at a time. Since the accesses are sequential, after the initial few clocks to get the data moving you&#39;ll store or load one word of data per clock (of course there will be delays for switching banks and occasional refreshes as there would be in any SDRAM interface, but these are not too great). 

 

That said our current DRAM controller does not have explicit burst support, but again, if the addresses presented to it are sequetial and the master is latency aware (which the DMA is), you&#39;ll get burst performance. 

 

The disadvantage of the above is that you rob the CPU of SDRAM access, assuming that a single SDRAM chip is shared between Nios data/instructions/DMA budfer... in the code sample you gave this is no problem, but in a more complex multi-threaded system this may be an issue unless your other code which is running is cached.  

 

Out of curiosity, what is your bandwidth requirement for this application for the DMA transfer, and what is your clock speed?
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Hello, 

 

I am using Nios II 1.1 and the DMA with Quartus 4.2 SP1. 

 

In my system I have only the DMA as the READ_MASTER of SDRAM . The Nios executes out of SRAM. 

I Have the the Tri bridge that the SRAM is on as the target of the WRITE_MASTER of the DMA. 

 

I have using the HAL paradigm and have put the DMA in a streaming mode where it reads from the same module, SDRAM in this case. 

 

I have played with the burst length changing that from 1024 bytes up to 4096 bytes, and got only about 30 Mbits/s in both cases. I then changed the arbitration for the SDRAM to 100 shares of 102, even though the processor fetches its instructions out of RAM. I have even increased the instruciton cache to see if this would make a difference.  

 

I all cases I peak at around 32 Mbits/s. 

 

I have tried to read most of the DMA threads about SDRAM performance and still don&#39;t see a straight forward way to see atleast a few 100Mbits/s performance out of the SDRAM. 

 

I have read and been told that the system I have, while it does not formalize bursting like Nios5.0, it still should move data on every clock cycle, with overhead of course reducing this.  

 

Even though this is an old topic can someone shed light on how to see decent SDRAM reads with a DMA? 

 

Thanks
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Hello,baycool! 

I&#39;d like to know your sram&#39;s setup time, hold time, wait clok setting in sopc builder. 

If your sram is a lentency off-chip slave maybe this is dma speed neck and not sdram. 

Now avalone-tristate bus don&#39;t support variable latency off-chip memory that works in  

burst mode with latencys master.( avalon bus manual p-78) 

You&#39;ve tried?
0 Kudos
Altera_Forum
Honored Contributor II
459 Views

Hello Anidea, 

 

Thanks for your reply. 

 

I am using the SRAM device that is part of the Cyclone Nios dev kit. I underdstand what you are saying but the speed of the bursts is off by a factor of 10 to 20! 

 

Actually, I talked to some Altera guys today and they made have shed a little light. It looks like for Qaurtus 4.2 and Nios 1.1 system, that the slaves have to "advertise " that they are capable of bursting. If the system does not pick this up, I am guessing from the component ptf , then I think it will go into a single cycle mode, which would explain my SDRAM numbers. 

 

Comments anyone? 

 

Thanks
0 Kudos
Reply