Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Can I accelerate?

Altera_Forum
Honored Contributor II
1,958 Views

http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/wink.gif hello,everyone! 

My NiosII&#39;s C code is: 

 

int main() 

unsigned count; 

while(1) 

count=IORD_ALTERA_AVALON_PIO_DATA(PIO_IN_BASE); 

IOWR_ALTERA_AVALON_PIO_DATA(PIO_OUT_BASE,count);  

return 0; 

 

In my project,I use a counter module to count clock numbers.Nios read 32bit PIO data from the counter module,then write the data back.The counter record the back data. After analyse the record,I found each while loop cost 14 clocks .It&#39;s too slow! 

How can I reduce the clock costed? 

May assemble code do better?
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
300 Views

It means that 7 clocks must be cost to read from PIO or write to PIO. 

 

I did the test also, Using Standard/Fast NIOS, 7 clocks are used, even 64K bytes Instruction cache is configured , When using economic nios, It will take even more clocks. 

 

who can analyse the behavior of avalon bus in this 7 clocks?
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

For a simple example like this, assembler code probably won&#39;t do better than optimised C. I assume you are using a release build - if not then please select one. 

 

But if you want to check then please open up a Nios II shell, change to <project dir>/Release and type `make obj/<filename>.s`. The .s file shows the assembly language code which the compiler has generated. 

 

Please post the assembler code and we can comment more usefully on what&#39;s going on.
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

Thanx. 

I did this test again under release build mode,then read and write use 10 clocks.the assembly language code is: 

.file "count_pio.c" 

.section .text 

.align 2 

.global main 

.type main, @function 

main: 

movhi r3, %hiadj(67584) 

addi r3, r3, %lo(67584) 

.L2: 

ldwio r2, 0(r3) 

stwio r2, 0(r3) 

br .L2 

.size main, .-main 

.ident "GCC: (GNU) 3.4.1 (Altera Nios II 1.1 b137)"
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

I didn&#39;t using a release build ,but my assembly language code just the same as Where200&#39;s.

0 Kudos
Altera_Forum
Honored Contributor II
300 Views

Thank you,wombat. 

I do this example again under release build,Using Standard/Fast NIOS.reading and writting only use 6 clocks.
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

One point of interest - in the code which is generated: 

.L2: ldwio r2, 0(r3) stwio r2, 0(r3) br .L2 

the processor will stall for two cycles after the ldwio because it can&#39;t use the value read from memory until two cycles later. It can use other registers though so the compiler will usually be able to insert other instructions here to keep the CPU busy.
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

Uncached memory accesses are generally slow with Nios II. For fast I/O you can use custom instructions that access the hardware directly. With that you can come down to 2 to 3 cycles per access. 

 

Regards, 

 

Thomas
0 Kudos
Altera_Forum
Honored Contributor II
300 Views

The custom instruction idea is interesting. 

You should in addition consider making, a dedicated hardware unit (interfaced as custom instruction or avalon or PIO) which takes the workload off the nios core and only returns precomputed results to the nios core. In that way you will be less reliant on the IO speed. 

I dont know your application ofcourse, but often some rethinking of the architecture can put more functionality into HW, and the speed increase can be dramatic.  

Given the details I am sure many people from the nios forum could give suggestions in that direction as well, 

regards 

henning
0 Kudos
Reply