- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/huh.gif HI!
I want fast CopyMemory function and write inline asm code. (under line) I don't know " LDW, STW" machine cycle.. Why very very slow "LDW, STW" ? Test KIT : Cyclone 50Mhz STD void CopyMemory(void *IN_pDS, void *IN_pSC, int IN_iSZ) { IN_iSZ >>= 2; //INF: DIV 4 , B.OF: 4BYTE MOVE asm ( "LABEL_1: \n\t" "ldw r7, 0(%1) \n\t" //??Cycle STD "stw r7, 0(%0) \n\t" //??Cycle STD "addi %1, %1, 4 \n\t" //1Cycle STD "addi %0, %0, 4 \n\t" //1Cycle STD "addi %2, %2, -1 \n\t" //1Cycle STD "bne %2, %3, LABEL_1 \n\t" //2Cycle STD : : "r"(IN_pDS), "r"(IN_pSC), "r"(IN_iSZ), "r"(0) ); }Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
are you using Nios II/fast ? there seems to be dependencies into your instruction: your are losing some cycles... ld/stw should take one instruction only when they use cache memory. maybe also you would prefer to use DMA. what about printing a dump of assembly code; instead of C inline assembly... Sylvain- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The cycle times for LDW and STW are in the Nios II processor reference handbook, chapter 16.
There is a two cycle load-use stall on the results of LDW so you should move your STW two cycles down. You could also consider calculating (IN_pDS + IN_iSZ) before the loop as that would save one add per loop. And finally you will gain by unrolling the loop if you are moving lots of data.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is the memory SDRAM or onchip RAM or what?
If it's sdram, then the bus turnaround (or read and write addresses in different collumns, etc.) may be killing you. Try using another register or so to do multiple LDW's and then multiple STW's. Of course it would be nice if NiosII supported post or preincrement addressing. That would eliminate the ADDI's. You can also try DMA as Sylvain suggests. I'm not sure if it does multiple reads and then multiple writes or not. It does have an internal fifo. The key is to do two or more reads back to back and then writes. Ken
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page