Hello, it's my first post.
I have question about GCC generated code. We can find several unnecessary instructions. --- sample code --- unsigned short uh; uh = IORD_16DIRECT(base,ofst); printf("",uh); --- GCC generated (part) --- ldhuio r18,0(r16) andi r20, r18, 65535 * call printf "ldhuio" instruction is zero-extending load, so r18 is already zero-extended. Next "andi" is unnecessary. Here is another sample. --- sample code --- short sh; sh = __builtin_ldhio(base); printf("",sh); --- GCC generated (part) --- ldhio r17, 0(r16) slli r7, r17, 16 * srai r21, r7, 16 * call printf Also, "ldhio" is signed-extending load, so r17 is already signed-extended. Next 2 shifts are waste. Does anyone have ideas to eliminate these unnecessary instructions? Regards, marm链接已复制
11 回复数
Don't use the IORD_16DIRECT ... it's a macro that expands to
the builtin function __builtin_ldhuio (see io.h). It's the builtin that (apparently) adds the extra instruction. If the extra instruction bothers you, you can always try something like:#define CACHE_BYPASS(addr) ((addr)|0x80000000)# define ADDR 0x09000024
...
unsigned short uh = *(volatile unsigned short*)CACHE_BYPASS(ADDR);
printf ("%04x\n",uh);
Regards, --Scott
Thank you, Scott.
But I wonder about future compatibility. Because we can find following sentence on Software Developer's Handbook. page 7-7 : "Future Nios II core may use bit 31 for other purposes." So I have hesitation about bit 31 method.Unfortunately the "extra" instruction after the ldhuio is not due to the macro.
The gcc compiler shipped with Nios II 1.0 and Nios II 1.0.1 isn't very good about understanding that the 8 and 16 bit load unsigned instructions (I/O or normal) zero out the top bits and so it puts in "extra" andi instructions when not needed. I've reported this to our compiler engineer as an enhancement request. I believe this will be fixed in Nios II 1.1 (due out late November) when we upgrade to a new version of gcc. As for the warning in the documentation about not using bit 31 to bypass the data cache, I had them put that in to help make it easier for a possible future MMU option for Nios II. Being the CPU architect, it's my job to plan ahead.Nios II 1.0.1 will be released any day now but I think you want Nios II 1.1 which won't be out
until late November. Maybe someone in Nios Marketing can explain the availability of Nios II?Dear marm,
This is on our list of things to try to improve over the long term (i.e. not Nios II 1.1). The issue that James is referring to is unrelated and has to do with volatile accesses of bitfields. The reason it is lower priority is that the optimization does not affect the normal use of these operations. I think the confusion here comes from the definition of the __builtin_ldXio family of functions. They all return values of type "int". Therefore when you assign it to a short or byte data type, the compiler needs to safely truncate it to that value. Therefore, the correct thing to do is store the returned value in an int and operate with it in that mode. It is almost always better to use int as the data type because it is the fastest data type to operate on. Does that make sense? JonahScott,
If no "volatile", sign-extending operations (andi, slli+sari) are eliminated. If "volatile" exists, sign-extending is used. Similar code as MACRO will be generated. I must use volatile because target address is shared-memory. It seems there is less way to avoid this issue... assembler?Thank you Jonah.
I make a sense in technically. But I do not understand that this issue has not higher priority. I think RISC processor's performance is not only hardware's, but also compiler's performance is needed. These codes will reduce much performance at /e core or all Nios II without DSP block. (x30@16bit, x50@8bit data). I hope compiler enhancement will get more higher priority. Regards, marm