Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
724 Views

NIOS C compilation and optimization issue

11.1SP2 Eclipse Based NIOSII software build Tools 

 

 

OK, that looks foolish, however, this has happened.  

I was asked to provide the latency figures for an algorithm. I got different latency figures in the following two trials: 

 

 

Routine() -> SW code for the algorithm 

 

 

Trial 1: 

 

 

The Routine() was declared and defined in the same source file as the main(). The compiled code provided latency of N cycles. 

 

Trial 2: 

1. 1. Defined the Routine in a separate file, Routine.c. Added its declaration in a header file Routine.h and included the header file in the main source file mainsource.c (good programming practice) 

When the source files were compiled, the latency result doubled around 2*N cycles. 

 

 

2. 2. Merged the Routine.c into mainsource.c files and compiled. The latency figures came back to around N cycles. 

 

I think I am missing something related to compiler/linker settings. Any idea ?
0 Kudos
5 Replies
Altera_Forum
Honored Contributor I
42 Views

Perhaps you are calling the function with compile-time known constants - so the compiler is optimising away a lot of the code. 

 

If the function isn't 'static' that shouldn't happen. 

Or pass the parameters from statically initialised global variables - which the compiler can't know aren't changed before the call. 

 

There is also the possibility that the compiler optimisations are enabled for one file, but not the other - unlikely if you are rebuilding these with the IDE. 

 

If all else fails you'll need to look at the object code - that tends to show up what is actually happenning.
Altera_Forum
Honored Contributor I
42 Views

 

--- Quote Start ---  

Perhaps you are calling the function with compile-time known constants - so the compiler is optimising away a lot of the code. 

 

If the function isn't 'static' that shouldn't happen. 

Or pass the parameters from statically initialised global variables - which the compiler can't know aren't changed before the call. 

 

AlphaKha> Let me provide some more detail: 

In both cases, the global variables are declared and initialized in main file mainsource.c to provide an easy to use list of test vectors. The values are passed to the Routine using pointers in both cases. The only difference is that in trial 2 the routine code from the mainsource.c is cut and pasted to another source file for the purpose of clarity and program organization. I am not clear how this produces the difference that in one case compiler knows about constants but in the other not. This may be the clue to this problem. Could you please comment on this. 

 

There is also the possibility that the compiler optimisations are enabled for one file, but not the other - unlikely if you are rebuilding these with the IDE. 

 

AlphaKh> Its ensured that compiler optimizations are enabled for all files. 

 

If all else fails you'll need to look at the object code - that tends to show up what is actually happenning. 

--- Quote End ---  

 

 

AlphaKh > The obj dump file shows the same size of code in both cases and though I haven't checked the instructions line by line, however, seems the instructions are identical with one exception that address offsets are different.
Altera_Forum
Honored Contributor I
42 Views

No need to quote all the post .... 

 

If the object code is the same, then maybe the difference is caused by additional instruction cache loads because code is now sharing instruction cache lines (the overall address map will probably have changed). 

The data addresses are unlikely to be significantly different - but might be. 

 

Running code from tightly coupled instruction/data memory (and with the dynamic branch prediction disabled) I got consistent timings that match the expected values.
Altera_Forum
Honored Contributor I
42 Views

With ITCM, instruction cache has no effect, so cache is out of question. Is there any other reason for this....?

Altera_Forum
Honored Contributor I
42 Views

You'll probably have to work it out yourself. 

Might be worth simplifying the test to see what makes a big difference. 

Sometimes repeating a code fragment can help identify which bits are slower than expected.
Reply