Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
公告
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
12748 讨论

NIOS C compilation and optimization issue

Altera_Forum
名誉分销商 II
1,357 次查看

11.1SP2 Eclipse Based NIOSII software build Tools 

 

 

OK, that looks foolish, however, this has happened.  

I was asked to provide the latency figures for an algorithm. I got different latency figures in the following two trials: 

 

 

Routine() -> SW code for the algorithm 

 

 

Trial 1: 

 

 

The Routine() was declared and defined in the same source file as the main(). The compiled code provided latency of N cycles. 

 

Trial 2: 

1. 1. Defined the Routine in a separate file, Routine.c. Added its declaration in a header file Routine.h and included the header file in the main source file mainsource.c (good programming practice) 

When the source files were compiled, the latency result doubled around 2*N cycles. 

 

 

2. 2. Merged the Routine.c into mainsource.c files and compiled. The latency figures came back to around N cycles. 

 

I think I am missing something related to compiler/linker settings. Any idea ?
0 项奖励
5 回复数
Altera_Forum
名誉分销商 II
675 次查看

Perhaps you are calling the function with compile-time known constants - so the compiler is optimising away a lot of the code. 

 

If the function isn't 'static' that shouldn't happen. 

Or pass the parameters from statically initialised global variables - which the compiler can't know aren't changed before the call. 

 

There is also the possibility that the compiler optimisations are enabled for one file, but not the other - unlikely if you are rebuilding these with the IDE. 

 

If all else fails you'll need to look at the object code - that tends to show up what is actually happenning.
0 项奖励
Altera_Forum
名誉分销商 II
675 次查看

 

--- Quote Start ---  

Perhaps you are calling the function with compile-time known constants - so the compiler is optimising away a lot of the code. 

 

If the function isn't 'static' that shouldn't happen. 

Or pass the parameters from statically initialised global variables - which the compiler can't know aren't changed before the call. 

 

AlphaKha> Let me provide some more detail: 

In both cases, the global variables are declared and initialized in main file mainsource.c to provide an easy to use list of test vectors. The values are passed to the Routine using pointers in both cases. The only difference is that in trial 2 the routine code from the mainsource.c is cut and pasted to another source file for the purpose of clarity and program organization. I am not clear how this produces the difference that in one case compiler knows about constants but in the other not. This may be the clue to this problem. Could you please comment on this. 

 

There is also the possibility that the compiler optimisations are enabled for one file, but not the other - unlikely if you are rebuilding these with the IDE. 

 

AlphaKh> Its ensured that compiler optimizations are enabled for all files. 

 

If all else fails you'll need to look at the object code - that tends to show up what is actually happenning. 

--- Quote End ---  

 

 

AlphaKh > The obj dump file shows the same size of code in both cases and though I haven't checked the instructions line by line, however, seems the instructions are identical with one exception that address offsets are different.
0 项奖励
Altera_Forum
名誉分销商 II
675 次查看

No need to quote all the post .... 

 

If the object code is the same, then maybe the difference is caused by additional instruction cache loads because code is now sharing instruction cache lines (the overall address map will probably have changed). 

The data addresses are unlikely to be significantly different - but might be. 

 

Running code from tightly coupled instruction/data memory (and with the dynamic branch prediction disabled) I got consistent timings that match the expected values.
0 项奖励
Altera_Forum
名誉分销商 II
675 次查看

With ITCM, instruction cache has no effect, so cache is out of question. Is there any other reason for this....?

0 项奖励
Altera_Forum
名誉分销商 II
675 次查看

You'll probably have to work it out yourself. 

Might be worth simplifying the test to see what makes a big difference. 

Sometimes repeating a code fragment can help identify which bits are slower than expected.
0 项奖励
回复