Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
公告
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Unnecessary instructions

Altera_Forum
名誉分销商 II
1,775 次查看

Hello, it's my first post. 

I have question about GCC generated code. We can find several unnecessary instructions. 

 

--- sample code --- 

unsigned short uh; 

uh = IORD_16DIRECT(base,ofst); 

printf("",uh); 

--- GCC generated (part) --- 

ldhuio r18,0(r16) 

andi r20, r18, 65535 * 

call printf 

 

"ldhuio" instruction is zero-extending load, so r18 is already zero-extended. Next "andi" is unnecessary. 

 

Here is another sample. 

--- sample code --- 

short sh; 

sh = __builtin_ldhio(base); 

printf("",sh); 

--- GCC generated (part) --- 

ldhio r17, 0(r16) 

slli r7, r17, 16 * 

srai r21, r7, 16 * 

call printf 

 

Also, "ldhio" is signed-extending load, so r17 is already signed-extended. Next 2 shifts are waste. 

 

Does anyone have ideas to eliminate these unnecessary instructions? 

Regards, 

marm
0 项奖励
11 回复数
Altera_Forum
名誉分销商 II
1,037 次查看

Aloha! 

 

What O(ptimization) setting have you used for this? Normally, GCC should be able to remove things like zero-extend instructions as part of it's optimzation runs.
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Hi! 

 

Sorry, I forgot writing options. 

I tested -O1,-O2 & -O3. Register usage are different, but code sequence are similar.  

The other options are IDE default.
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Don't use the IORD_16DIRECT ... it's a macro that expands to 

the builtin function __builtin_ldhuio (see io.h). It's the builtin 

that (apparently) adds the extra instruction. 

 

If the extra instruction bothers you, you can always try something 

like: 

 

#define CACHE_BYPASS(addr)       ((addr)|0x80000000)# define ADDR                                 0x09000024 ...  unsigned short uh = *(volatile unsigned short*)CACHE_BYPASS(ADDR);  printf ("%04x\n",uh); 

 

Regards, 

--Scott
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Thank you, Scott. 

 

But I wonder about future compatibility. Because we can find following sentence on Software Developer's Handbook. 

page 7-7 : "Future Nios II core may use bit 31 for other purposes." 

 

So I have hesitation about bit 31 method.
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Unfortunately the "extra" instruction after the ldhuio is not due to the macro. 

The gcc compiler shipped with Nios II 1.0 and Nios II 1.0.1 isn't very good about 

understanding that the 8 and 16 bit load unsigned instructions (I/O or normal) zero 

out the top bits and so it puts in "extra" andi instructions when not needed. 

I've reported this to our compiler engineer as an enhancement request. 

I believe this will be fixed in Nios II 1.1 (due out late November) when we 

upgrade to a new version of gcc. 

 

As for the warning in the documentation about not using bit 31 to bypass the data cache, 

I had them put that in to help make it easier for a possible future MMU option for Nios II. 

Being the CPU architect, it's my job to plan ahead.
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

James, 

 

Is version 1.0.1 available to the public? 

 

Thanks, 

Ken
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Nios II 1.0.1 will be released any day now but I think you want Nios II 1.1 which won't be out  

until late November. Maybe someone in Nios Marketing can explain the availability of Nios II?
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Hi James. 

Thank you for your clear comment. I see this might be solved in the future version. 

 

Thanks, 

marm
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Dear marm, 

 

This is on our list of things to try to improve over the long term (i.e. not Nios II 1.1). The issue that James is referring to is unrelated and has to do with volatile accesses of bitfields.  

 

The reason it is lower priority is that the optimization does not affect the normal use of these operations. I think the confusion here comes from the definition of the __builtin_ldXio family of functions. They all return values of type "int". Therefore when you assign it to a short or byte data type, the compiler needs to safely truncate it to that value. Therefore, the correct thing to do is store the returned value in an int and operate with it in that mode. It is almost always better to use int as the data type because it is the fastest data type to operate on.  

 

Does that make sense?  

 

Jonah
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Scott, 

If no "volatile", sign-extending operations (andi, slli+sari) are eliminated. If "volatile" exists, sign-extending is used. Similar code as MACRO will be generated. 

 

I must use volatile because target address is shared-memory. 

 

It seems there is less way to avoid this issue... assembler?
0 项奖励
Altera_Forum
名誉分销商 II
1,037 次查看

Thank you Jonah. 

I make a sense in technically. 

But I do not understand that this issue has not higher priority. I think RISC processor's performance is not only hardware's, but also compiler's performance is needed. 

These codes will reduce much performance at /e core or all Nios II without DSP block. (x30@16bit, x50@8bit data). 

 

I hope compiler enhancement will get more higher priority. 

Regards, 

marm
0 项奖励
回复