Apparent Bug in 64-bit ABI Compliance

montyshasta · ‎11-15-2007

The 64-bit calling convention for Unix dictates a function should return a floating point value in XMM0, as opposed to on the FP stack as a 32-bit function would.

With ICC version 10.0 20070809 I have found this is abided inconsistently depending on optimization options compiled with. This is problematic if a called function has return value semantics coded in assembly, but not symptomatic if everything is C code. Take the following example:

/////////////////////////////////////////////////////////
#include
inline float Pi(void)
{
float f;
__asm fldpi
__asm fstp f
__asm movss xmm0, f
}
int main(void)
{
std::cout << Pi() << std::endl;
}
/////////////////////////////////////////////////////////

When compiled with -O0 this produces the expected output
$icc -use_msasm -O0 c.cpp.
$./a.out
3.14159

Not when compiled with -O2:
$icc -use_msasm -O2 c.cpp.
$./a.out
nan

Investigating the disassembly shows why. The -O0 version retrieves the return value of Pi() from XMM0:
0x00000000004009b4

: call 0x400998 <_Z2Piv>
0x00000000004009b9

: movss DWORD PTR [rbp-8],xmm0
0x00000000004009be

: mov eax,0x600ee0
0x00000000004009c3

: movss xmm0,DWORD PTR [rbp-8]
0x00000000004009c8

: mov rdi,rax
0x00000000004009cb

: call 0x400898 <_ZNSolsEf@plt>

The -O2 version retrieves the return value from the FP stack:
0x0000000000400e45

: fldpi
0x0000000000400e47

: fstp DWORD PTR [rsp+8]
0x0000000000400e4b

: movss xmm0,DWORD PTR [rsp+8]
0x0000000000400e51

: fstp DWORD PTR [rsp]
0x0000000000400e54

: movss xmm0,DWORD PTR [rsp]
0x0000000000400e59

: mov edi,0x604640
0x0000000000400e5e

: call 0x400d10 <_ZNSolsEf@plt>

PS: also submitted on premier.intel.com.

TimP · ‎11-16-2007

If you are talking about a convention used under Solaris, I don't see that as a reliable guide to what happens in Windows.
What you are doing here is done much more efficiently and as accurately by standard C code
float f=3.1415927;
I can't see why you would require fldpi unless you were using 80-bit long double, but Windows support for that is inadequate, in more than one way, as you appear to have shown.

levicki · ‎11-18-2007

Maybe he just got tired from typing PI? ;)