- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I usedSDE to learn the FMA instruction, but i got trouble this...
result:
3183
3.14159265
10000
FMA:10000.000000-3183.000000*3.141593=0.310305
x87:10000.000000-3183.000000*3.141593=0.310547
is it bug?
Code:
__declspec(naked) float vfnmaddss(float a, float b, float c) {
__asm {
movss xmm0,dword ptr [esp+4]
movss xmm1,dword ptr [esp+8]
movss xmm2,dword ptr [esp+0Ch]
vfnmaddss xmm0, xmm0, xmm1, xmm2
movss dword ptr [esp+4],xmm0
fld dword ptr [esp+4]
ret
}
}
int main() {
float a,b,c;
scanf("%f",&a);
scanf("%f",&b);
scanf("%f",&c);
printf("FMA:%f-%f*%f=%f\n",c,a,b,vfnmaddss(a,b,c));
printf("x87:%f-%f*%f=%f\n",c,a,b,-(a*b)+c);
}
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
msvc:
x87:10000.000000-3183.000000*3.141593=0.310305
It's hard to tell which one is right, the input value of PI doesn't exactlyfit into a float, and we cannot see the exact results either because printf rounds not just PI (b) but the results too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(apologies for the delayed response. I was just notified about this posting.)
Hi, the different answers are because the FMA is using a fused multiply-add without an internal rounding step. On x87, you have an "extra" round between the multiply and the add. If you have a compiler that supports posix fused fma routine called "fmaf" for single precision, you'll see that you would get the answer that the Intel SDE FMA produces.
% cat fma44.c#include#includeint main() {float a,b,c,d;b = -3183;c = 3.14159265;d = 10000;a = fmaf(b,c,d);printf("%fn",a);return 0;}% icc -o fma44 fma44.c -lm% ./fma440.310305
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hm, this does not explain why msvc ended up with the same results without fma.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, For this test, MSVC uses SSE on Intel64 but uses x87 on IA32. If do the computation on IA32 using x87floating point hardware thenyouget answer that the fused fma gives since multiplying two 24bfractions will result in 48b fraction and that will easily fit in the 64b fraction of the 80b float. In this case, I believe Windows defaults to using double precision for the x87 floating point stack. Either way, the 53b fraction is sufficient to hold the exact product that would be used in the fused multiply-add.
If you use MSVC on Intel64 and thus use SSE single precision, then it gets the other answer.
Not surprisingly when there is approximate numerical representations of the input values and numerical cancelation like this, truedouble precision gives yet a thirdanswer.
Regards,
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
True, it was using double precision, just thought it would still be different from fma, due to the way it is calculated.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page