- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running the C++ compiler on Debian amd64 with a 2.6 kernel. The compiler fails to tail optimize the following code:
/*---------------------------*/
#include
void foo() __attribute__((noinline));
void bar() __attribute__((noinline));
void bar() { printf("f() "); }
void foo() { bar(); }
int main(int argc, char *argv[])
{
foo();
return 0;
}
/*---------------------------*/
gcc 4.2 with -O3 generates the following assembly instructions for foo():
xor %eax,%eax
jmpq 4004a0
and the Intel compiler with -fast generates this:
push %rsi
callq 4002a0
pop %rcx
retq
Am I missing some compiler option here? Can someone please explain this to me?
Thank you.
/*---------------------------*/
#include
void foo() __attribute__((noinline));
void bar() __attribute__((noinline));
void bar() { printf("f() "); }
void foo() { bar(); }
int main(int argc, char *argv[])
{
foo();
return 0;
}
gcc 4.2 with -O3 generates the following assembly instructions for foo():
xor %eax,%eax
jmpq 4004a0
and the Intel compiler with -fast generates this:
push %rsi
callq 4002a0
pop %rcx
retq
Am I missing some compiler option here? Can someone please explain this to me?
Thank you.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I might be missing something here too. In your example, the motivation for disabling the usual optimizations in both compilers by setting __attribute__((noinline)) aren't obvious. If the functions were too big for inline to work, it seems that tail call optimization wouldn't gain much, and could still hinder profiling. No doubt, more compelling cases, at least with tail recursion, could be set up, where special optimizations in gcc would look attractive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reason I used __attribute__((noinline)) was to emulate a C++ virtual function call which cannot be inlined. I did not use a C++ example in my first post in the interest of clarity and simplicity.
In my C++ tests, both compilers produce the same assembly listed in my first post. The gcc compiler tail optimizes and the Intel compiler does not. Are there any cases where the Intel compiler *does* tail optimize?
In my C++ tests, both compilers produce the same assembly listed in my first post. The gcc compiler tail optimizes and the Intel compiler does not. Are there any cases where the Intel compiler *does* tail optimize?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for bringing this to our attention, I've submitted an issue on this and will let you know when it's addressed.
Dale
Dale
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page