- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm having a try at the moment to port my C-code from the arm-7 mcu, LPC-P2148 to Nios II, SE1-board , Cyclon-II, EP2C20F484C7. The arm-none-eabi-gcc compiler was used for the ARM-7 MCU and the NIOSII Eclipse Platform , version 9.1 for the nios ii/e cpu. ( e CPU: Cost and licence is based ). More details can bei obtained from my homepage. A) Performance issue: ARM-7: 12MHz clock , running at 48 Mhz via PLL. Nios II/e 50MHz clock , running at 50 Mhz. Used code: ( shifting LED Demo ) void startup_leds(void); void delay(void);# include <stdio.h># include <system.h># include "altera_avalon_pio_regs.h" // void startup_leds(void) { short is; short count = 4; int laufled = 0x00000001; for (count = 0; count < 4; count++ ) { for (is = 0; is < 16; is++ ) { IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE, laufled); // IOPIN1 = laufled<<16; // was ARM-Code delay(); // wait laufled = laufled <<1; } for (is = 0; is < 16; is++ ) { //IOPIN1 = laufled<<16; // was ARM-Code IOWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE, laufled); delay(); // wait laufled = laufled >>1; } } } // void delay(void) { short int wait = 6000; // arm-7: 50000 for same speed. while (wait) { wait = wait -1; } } // int main(void) { short count = 0; int delay; printf(" Hello from Nios II \n\r"); startup_leds(); while(1) { startup_leds(); } return 0; } Result. the arm-7 runs about 7 times faster than the nios ii/e ! Is the reason based on the Nios II/e CPU ? Maybe something wrong configured via SOPC Builder? Is my C-code correct? IOPIN1 = laufled<<16; = ARM-7 code with additional 16 ASR ! Converted to : OWR_ALTERA_AVALON_PIO_DATA(PIO_0_BASE, laufled); B) What is the recommendation reading Data from a PIO or test the condition from a single bits. For Example, I want to check the status of BIT-16. The ARM-7 C-code Statement if (!( IOPIN0 & 0x00010000 )) .... Nios II statement ? I was playing around with the IORDALTERA_AVALON_PIO_DATA(base) but without any success yet. Problem with SOPC Builder and PIO optional settings ? Everybody answer is welcome, Regards , ReinhardLink Copied
- « Previous
-
- 1
- 2
- Next »
29 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me see ... Nios will do the loop in 6 clocks ... assuming the timing measurement by PDPGY is accurate, then ARM must be doing the loop in 1 clock to get the 7 to 1 timing.
This fails the sanity test.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Once again, the 7 to 1 timing applies to a nios ii/e core (this is the core PDP11GY used).
Have a look at the NIOS II performance benchmarks www.altera.com/literature/ds/ds_nios2_perf.pdf NIOS II/e 0.15 MIPS/MHz NIOS II/s 0.64 MIPS/MHz NIOS II/f 1.13 MIPS/MHz- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Let me see ... Nios will do the loop in 6 clocks ... assuming the timing measurement by PDPGY is accurate, then ARM must be doing the loop in 1 clock to get the 7 to 1 timing. This fails the sanity test. --- Quote End --- Actually for the /e it is more likely to be 12 clocks, and 18 if the 'short' causes an additional 'andi rn,rn,0xffff' instruction. Not to mention the Avalon MM delays reading the instructions, I suspect the fastest you'll see (from an M9K memory block) is one wait state. I can't remember the ARM instruction set that well, and don't know the exact timings - the branch cost (particularly mispredicted) will depend very much on the ARM architecture. So one wait state and 6 clocks for 3 instructions is about a 7:1 timing difference against even a nios /f core.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- Once again, the 7 to 1 timing applies to a nios ii/e core (this is the core PDP11GY used). Have a look at the NIOS II performance benchmarks www.altera.com/literature/ds/ds_nios2_perf.pdf NIOS II/e 0.15 MIPS/MHz NIOS II/s 0.64 MIPS/MHz NIOS II/f 1.13 MIPS/MHz --- Quote End --- So all of those things used to get the cycles/loop reduced to 6 do not apply. And knowing the MIPS without knowing the benchmark is meaningless. And the cost budget only covered the NIOSII/e, so PDP11GY could not afford NIOSII/f. I sure would like to understand how a single core can achieve 1.13 MIPS/MHZ. It must complete more than one instruction per cycle, i.e. 2 per cycle about 13% of the time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as the benchmark goes, it's the standard dhrystone mips (http://en.wikipedia.org/wiki/dhrystone) that most everyone uses, including ARM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A couple of suggestions.
1) don't force a short int for NIOS. Compilers are free to choose a number of different int sizes based on the 'natural' size of the int for the targeted processor. Short int on the ARM is probably selecting the ARM register size, whereas the NIOS is having to do extra work to mask each access to 16-bits rather than the 32-bit register size. Just use a plain int and you will get each processors 'natural' register size. 2) Try forcing a 32-bit int for each processor and see what happens. uint32_t for NIOS, and the same for ARM (or maybe long int if stdint is not available for the ARM compiler). The results might be interesting. 3) for fastest counting, trywhile(wait--); // yes, this is valid C code.
// Test is for NON-zero, not true/false
instead of all that extra code. It's easier for a compiler to optimize.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually the compiler will transform the loop you gave into:
if (wait != 0) {
do
wait = wait - 1;
while (wait != 0);
}
Depending on the code layout the first conditional might get optimised for the 'wait == 0' case. You get better code from a do ... while () loop. So you want while (--wait); However, for the example code, the compiler knew the value of the constant, so won't have compiled the initial test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- I sure would like to understand how a single core can achieve 1.13 MIPS/MHZ. It must complete more than one instruction per cycle, i.e. 2 per cycle about 13% of the time? --- Quote End --- The 1.13 MIPS/MHz are DMIPS/MHz. DMIPS is a calculated value based on the benchmark test. A value >1 does not necessarily mean, that the cpu can perform more than one instruction per cycle. The MIPS/MHz value for the Nios /f core should be less than 1, otherwise NIOS II/f would be a superscalar architecture.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--- Quote Start --- So you want while (--wait); --- Quote End --- That one does the really long wait for a value of zero, which might be what is wanted. I'll get around to looking at NIOS code for these eventually. I tested this on several CISC processors and also HiTech C on several varieties of PIC RISC. For short int, I know the CISCs had a one instruction 'decrement, branch if non-zero'. I think the PIC did too (long time ago now, so I may be mixing my memories), but for whatever reason, the while(var--) did a 1 CPU cycle per count, + overhead. The while(--var) was probably the same except for no short-cut exit. Other variations took significantly longer. NIOS might be different. Either way, the biggest reduction was selecting the processors register size as the variation of the 'int' to use for the counter. Equal variable sizes on processors with different sized registers isn't a fair test.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »