Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Unnecessary instructions

Altera_Forum
Honored Contributor II
1,723 Views

Hello, it's my first post. 

I have question about GCC generated code. We can find several unnecessary instructions. 

 

--- sample code --- 

unsigned short uh; 

uh = IORD_16DIRECT(base,ofst); 

printf("",uh); 

--- GCC generated (part) --- 

ldhuio r18,0(r16) 

andi r20, r18, 65535 * 

call printf 

 

"ldhuio" instruction is zero-extending load, so r18 is already zero-extended. Next "andi" is unnecessary. 

 

Here is another sample. 

--- sample code --- 

short sh; 

sh = __builtin_ldhio(base); 

printf("",sh); 

--- GCC generated (part) --- 

ldhio r17, 0(r16) 

slli r7, r17, 16 * 

srai r21, r7, 16 * 

call printf 

 

Also, "ldhio" is signed-extending load, so r17 is already signed-extended. Next 2 shifts are waste. 

 

Does anyone have ideas to eliminate these unnecessary instructions? 

Regards, 

marm
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
985 Views

Aloha! 

 

What O(ptimization) setting have you used for this? Normally, GCC should be able to remove things like zero-extend instructions as part of it's optimzation runs.
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Hi! 

 

Sorry, I forgot writing options. 

I tested -O1,-O2 & -O3. Register usage are different, but code sequence are similar.  

The other options are IDE default.
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Don't use the IORD_16DIRECT ... it's a macro that expands to 

the builtin function __builtin_ldhuio (see io.h). It's the builtin 

that (apparently) adds the extra instruction. 

 

If the extra instruction bothers you, you can always try something 

like: 

 

#define CACHE_BYPASS(addr)       ((addr)|0x80000000)# define ADDR                                 0x09000024 ...  unsigned short uh = *(volatile unsigned short*)CACHE_BYPASS(ADDR);  printf ("%04x\n",uh); 

 

Regards, 

--Scott
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Thank you, Scott. 

 

But I wonder about future compatibility. Because we can find following sentence on Software Developer's Handbook. 

page 7-7 : "Future Nios II core may use bit 31 for other purposes." 

 

So I have hesitation about bit 31 method.
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Unfortunately the "extra" instruction after the ldhuio is not due to the macro. 

The gcc compiler shipped with Nios II 1.0 and Nios II 1.0.1 isn't very good about 

understanding that the 8 and 16 bit load unsigned instructions (I/O or normal) zero 

out the top bits and so it puts in "extra" andi instructions when not needed. 

I've reported this to our compiler engineer as an enhancement request. 

I believe this will be fixed in Nios II 1.1 (due out late November) when we 

upgrade to a new version of gcc. 

 

As for the warning in the documentation about not using bit 31 to bypass the data cache, 

I had them put that in to help make it easier for a possible future MMU option for Nios II. 

Being the CPU architect, it's my job to plan ahead.
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

James, 

 

Is version 1.0.1 available to the public? 

 

Thanks, 

Ken
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Nios II 1.0.1 will be released any day now but I think you want Nios II 1.1 which won't be out  

until late November. Maybe someone in Nios Marketing can explain the availability of Nios II?
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Hi James. 

Thank you for your clear comment. I see this might be solved in the future version. 

 

Thanks, 

marm
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Dear marm, 

 

This is on our list of things to try to improve over the long term (i.e. not Nios II 1.1). The issue that James is referring to is unrelated and has to do with volatile accesses of bitfields.  

 

The reason it is lower priority is that the optimization does not affect the normal use of these operations. I think the confusion here comes from the definition of the __builtin_ldXio family of functions. They all return values of type "int". Therefore when you assign it to a short or byte data type, the compiler needs to safely truncate it to that value. Therefore, the correct thing to do is store the returned value in an int and operate with it in that mode. It is almost always better to use int as the data type because it is the fastest data type to operate on.  

 

Does that make sense?  

 

Jonah
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Scott, 

If no "volatile", sign-extending operations (andi, slli+sari) are eliminated. If "volatile" exists, sign-extending is used. Similar code as MACRO will be generated. 

 

I must use volatile because target address is shared-memory. 

 

It seems there is less way to avoid this issue... assembler?
0 Kudos
Altera_Forum
Honored Contributor II
985 Views

Thank you Jonah. 

I make a sense in technically. 

But I do not understand that this issue has not higher priority. I think RISC processor's performance is not only hardware's, but also compiler's performance is needed. 

These codes will reduce much performance at /e core or all Nios II without DSP block. (x30@16bit, x50@8bit data). 

 

I hope compiler enhancement will get more higher priority. 

Regards, 

marm
0 Kudos
Reply