Community
cancel
Showing results for 
Search instead for 
Did you mean: 
wowtiger
Beginner
141 Views

Low Precision VFNMADDSS on SDE(AVX Emulator)

I usedSDE to learn the FMA instruction, but i got trouble this...

result:
3183
3.14159265
10000
FMA:10000.000000-3183.000000*3.141593=0.310305
x87:10000.000000-3183.000000*3.141593=0.310547

is it bug?

Code:
__declspec(naked) float vfnmaddss(float a, float b, float c) {
__asm {
movss xmm0,dword ptr [esp+4]
movss xmm1,dword ptr [esp+8]
movss xmm2,dword ptr [esp+0Ch]
vfnmaddss xmm0, xmm0, xmm1, xmm2
movss dword ptr [esp+4],xmm0
fld dword ptr [esp+4]
ret
}
}

int main() {
float a,b,c;
scanf("%f",&a);
scanf("%f",&b);
scanf("%f",&c);
printf("FMA:%f-%f*%f=%f\n",c,a,b,vfnmaddss(a,b,c));
printf("x87:%f-%f*%f=%f\n",c,a,b,-(a*b)+c);
}

0 Kudos
5 Replies
gabest
Beginner
141 Views

msvc:
x87:10000.000000-3183.000000*3.141593=0.310305
It's hard to tell which one is right, the input value of PI doesn't exactlyfit into a float, and we cannot see the exact results either because printf rounds not just PI (b) but the results too.
MarkC_Intel
Moderator
141 Views


(apologies for the delayed response. I was just notified about this posting.)

Hi, the different answers are because the FMA is using a fused multiply-add without an internal rounding step. On x87, you have an "extra" round between the multiply and the add. If you have a compiler that supports posix fused fma routine called "fmaf" for single precision, you'll see that you would get the answer that the Intel SDE FMA produces.

% cat fma44.c
#include
#include
int main() {
float a,b,c,d;
b = -3183;
c = 3.14159265;
d = 10000;

a = fmaf(b,c,d);
printf("%fn",a);
return 0;
}

% icc -o fma44 fma44.c -lm

% ./fma44
0.310305
gabest
Beginner
141 Views

Hm, this does not explain why msvc ended up with the same results without fma.
MarkC_Intel
Moderator
141 Views


Hi, For this test, MSVC uses SSE on Intel64 but uses x87 on IA32. If do the computation on IA32 using x87floating point hardware thenyouget answer that the fused fma gives since multiplying two 24bfractions will result in 48b fraction and that will easily fit in the 64b fraction of the 80b float. In this case, I believe Windows defaults to using double precision for the x87 floating point stack. Either way, the 53b fraction is sufficient to hold the exact product that would be used in the fused multiply-add.

If you use MSVC on Intel64 and thus use SSE single precision, then it gets the other answer.

Not surprisingly when there is approximate numerical representations of the input values and numerical cancelation like this, truedouble precision gives yet a thirdanswer.

Regards,
Mark
gabest
Beginner
141 Views

True, it was using double precision, just thought it would still be different from fma, due to the way it is calculated.
Reply