- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, it's my first post.
I have question about GCC generated code. We can find several unnecessary instructions. --- sample code --- unsigned short uh; uh = IORD_16DIRECT(base,ofst); printf("",uh); --- GCC generated (part) --- ldhuio r18,0(r16) andi r20, r18, 65535 * call printf "ldhuio" instruction is zero-extending load, so r18 is already zero-extended. Next "andi" is unnecessary. Here is another sample. --- sample code --- short sh; sh = __builtin_ldhio(base); printf("",sh); --- GCC generated (part) --- ldhio r17, 0(r16) slli r7, r17, 16 * srai r21, r7, 16 * call printf Also, "ldhio" is signed-extending load, so r17 is already signed-extended. Next 2 shifts are waste. Does anyone have ideas to eliminate these unnecessary instructions? Regards, marmLink Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aloha!
What O(ptimization) setting have you used for this? Normally, GCC should be able to remove things like zero-extend instructions as part of it's optimzation runs.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Sorry, I forgot writing options. I tested -O1,-O2 & -O3. Register usage are different, but code sequence are similar. The other options are IDE default.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Don't use the IORD_16DIRECT ... it's a macro that expands to
the builtin function __builtin_ldhuio (see io.h). It's the builtin that (apparently) adds the extra instruction. If the extra instruction bothers you, you can always try something like:#define CACHE_BYPASS(addr) ((addr)|0x80000000)# define ADDR 0x09000024
...
unsigned short uh = *(volatile unsigned short*)CACHE_BYPASS(ADDR);
printf ("%04x\n",uh);
Regards, --Scott
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Scott.
But I wonder about future compatibility. Because we can find following sentence on Software Developer's Handbook. page 7-7 : "Future Nios II core may use bit 31 for other purposes." So I have hesitation about bit 31 method.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately the "extra" instruction after the ldhuio is not due to the macro.
The gcc compiler shipped with Nios II 1.0 and Nios II 1.0.1 isn't very good about understanding that the 8 and 16 bit load unsigned instructions (I/O or normal) zero out the top bits and so it puts in "extra" andi instructions when not needed. I've reported this to our compiler engineer as an enhancement request. I believe this will be fixed in Nios II 1.1 (due out late November) when we upgrade to a new version of gcc. As for the warning in the documentation about not using bit 31 to bypass the data cache, I had them put that in to help make it easier for a possible future MMU option for Nios II. Being the CPU architect, it's my job to plan ahead.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
James,
Is version 1.0.1 available to the public? Thanks, Ken- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nios II 1.0.1 will be released any day now but I think you want Nios II 1.1 which won't be out
until late November. Maybe someone in Nios Marketing can explain the availability of Nios II?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James.
Thank you for your clear comment. I see this might be solved in the future version. Thanks, marm- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear marm,
This is on our list of things to try to improve over the long term (i.e. not Nios II 1.1). The issue that James is referring to is unrelated and has to do with volatile accesses of bitfields. The reason it is lower priority is that the optimization does not affect the normal use of these operations. I think the confusion here comes from the definition of the __builtin_ldXio family of functions. They all return values of type "int". Therefore when you assign it to a short or byte data type, the compiler needs to safely truncate it to that value. Therefore, the correct thing to do is store the returned value in an int and operate with it in that mode. It is almost always better to use int as the data type because it is the fastest data type to operate on. Does that make sense? Jonah- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Scott,
If no "volatile", sign-extending operations (andi, slli+sari) are eliminated. If "volatile" exists, sign-extending is used. Similar code as MACRO will be generated. I must use volatile because target address is shared-memory. It seems there is less way to avoid this issue... assembler?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jonah.
I make a sense in technically. But I do not understand that this issue has not higher priority. I think RISC processor's performance is not only hardware's, but also compiler's performance is needed. These codes will reduce much performance at /e core or all Nios II without DSP block. (x30@16bit, x50@8bit data). I hope compiler enhancement will get more higher priority. Regards, marm
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page