Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Apparent Bug in 64-bit ABI Compliance

montyshasta
Beginner
314 Views
The 64-bit calling convention for Unix dictates a function should return a floating point value in XMM0, as opposed to on the FP stack as a 32-bit function would.

With ICC version 10.0 20070809 I have found this is abided inconsistently depending on optimization options compiled with. This is problematic if a called function has return value semantics coded in assembly, but not symptomatic if everything is C code. Take the following example:

/////////////////////////////////////////////////////////
#include
inline float Pi(void)
{
float f;
__asm fldpi
__asm fstp f
__asm movss xmm0, f
}
int main(void)
{
std::cout << Pi() << std::endl;
}
/////////////////////////////////////////////////////////

When compiled with -O0 this produces the expected output
$icc -use_msasm -O0 c.cpp.
$./a.out
3.14159

Not when compiled with -O2:
$icc -use_msasm -O2 c.cpp.
$./a.out
nan

Investigating the disassembly shows why. The -O0 version retrieves the return value of Pi() from XMM0:
0x00000000004009b4
: call 0x400998 <_Z2Piv>
0x00000000004009b9
: movss DWORD PTR [rbp-8],xmm0
0x00000000004009be
: mov eax,0x600ee0
0x00000000004009c3
: movss xmm0,DWORD PTR [rbp-8]
0x00000000004009c8
: mov rdi,rax
0x00000000004009cb
: call 0x400898 <_ZNSolsEf@plt>

The -O2 version retrieves the return value from the FP stack:
0x0000000000400e45
: fldpi
0x0000000000400e47
: fstp DWORD PTR [rsp+8]
0x0000000000400e4b
: movss xmm0,DWORD PTR [rsp+8]
0x0000000000400e51
: fstp DWORD PTR [rsp]
0x0000000000400e54
: movss xmm0,DWORD PTR [rsp]
0x0000000000400e59
: mov edi,0x604640
0x0000000000400e5e
: call 0x400d10 <_ZNSolsEf@plt>

PS: also submitted on premier.intel.com.
0 Kudos
2 Replies
TimP
Honored Contributor III
314 Views
If you are talking about a convention used under Solaris, I don't see that as a reliable guide to what happens in Windows.
What you are doing here is done much more efficiently and as accurately by standard C code
float f=3.1415927;
I can't see why you would require fldpi unless you were using 80-bit long double, but Windows support for that is inadequate, in more than one way, as you appear to have shown.
0 Kudos
levicki
Valued Contributor I
314 Views

Maybe he just got tired from typing PI? ;)

0 Kudos
Reply