- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 64-bit calling convention for Unix dictates a function should return a floating point value in XMM0, as opposed to on the FP stack as a 32-bit function would.
With ICC version 10.0 20070809 I have found this is abided inconsistently depending on optimization options compiled with. This is problematic if a called function has return value semantics coded in assembly, but not symptomatic if everything is C code. Take the following example:
/////////////////////////////////////////////////////////
#include
inline float Pi(void)
{
float f;
__asm fldpi
__asm fstp f
__asm movss xmm0, f
}
int main(void)
{
std::cout << Pi() << std::endl;
}
/////////////////////////////////////////////////////////
When compiled with -O0 this produces the expected output
$icc -use_msasm -O0 c.cpp.
$./a.out
3.14159
Not when compiled with -O2:
$icc -use_msasm -O2 c.cpp.
$./a.out
nan
Investigating the disassembly shows why. The -O0 version retrieves the return value of Pi() from XMM0:
0x00000000004009b4: call 0x400998 <_Z2Piv>
0x00000000004009b9: movss DWORD PTR [rbp-8],xmm0
0x00000000004009be: mov eax,0x600ee0
0x00000000004009c3: movss xmm0,DWORD PTR [rbp-8]
0x00000000004009c8: mov rdi,rax
0x00000000004009cb: call 0x400898 <_ZNSolsEf@plt>
The -O2 version retrieves the return value from the FP stack:
0x0000000000400e45: fldpi
0x0000000000400e47: fstp DWORD PTR [rsp+8]
0x0000000000400e4b: movss xmm0,DWORD PTR [rsp+8]
0x0000000000400e51: fstp DWORD PTR [rsp]
0x0000000000400e54: movss xmm0,DWORD PTR [rsp]
0x0000000000400e59: mov edi,0x604640
0x0000000000400e5e: call 0x400d10 <_ZNSolsEf@plt>
PS: also submitted on premier.intel.com.
With ICC version 10.0 20070809 I have found this is abided inconsistently depending on optimization options compiled with. This is problematic if a called function has return value semantics coded in assembly, but not symptomatic if everything is C code. Take the following example:
/////////////////////////////////////////////////////////
#include
inline float Pi(void)
{
float f;
__asm fldpi
__asm fstp f
__asm movss xmm0, f
}
int main(void)
{
std::cout << Pi() << std::endl;
}
/////////////////////////////////////////////////////////
When compiled with -O0 this produces the expected output
$icc -use_msasm -O0 c.cpp.
$./a.out
3.14159
Not when compiled with -O2:
$icc -use_msasm -O2 c.cpp.
$./a.out
nan
Investigating the disassembly shows why. The -O0 version retrieves the return value of Pi() from XMM0:
0x00000000004009b4
0x00000000004009b9
0x00000000004009be
0x00000000004009c3
0x00000000004009c8
0x00000000004009cb
The -O2 version retrieves the return value from the FP stack:
0x0000000000400e45
0x0000000000400e47
0x0000000000400e4b
0x0000000000400e51
0x0000000000400e54
0x0000000000400e59
0x0000000000400e5e
PS: also submitted on premier.intel.com.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are talking about a convention used under Solaris, I don't see that as a reliable guide to what happens in Windows.
What you are doing here is done much more efficiently and as accurately by standard C code
float f=3.1415927;
I can't see why you would require fldpi unless you were using 80-bit long double, but Windows support for that is inadequate, in more than one way, as you appear to have shown.
What you are doing here is done much more efficiently and as accurately by standard C code
float f=3.1415927;
I can't see why you would require fldpi unless you were using 80-bit long double, but Windows support for that is inadequate, in more than one way, as you appear to have shown.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe he just got tired from typing PI? ;)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page