- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I usedSDE to learn the FMA instruction, but i got trouble this...

result:

3183

3.14159265

10000

FMA:10000.000000-3183.000000*3.141593=0.310305

x87:10000.000000-3183.000000*3.141593=0.310547

is it bug?

Code:

__declspec(naked) float vfnmaddss(float a, float b, float c) {

__asm {

movss xmm0,dword ptr [esp+4]

movss xmm1,dword ptr [esp+8]

movss xmm2,dword ptr [esp+0Ch]

vfnmaddss xmm0, xmm0, xmm1, xmm2

movss dword ptr [esp+4],xmm0

fld dword ptr [esp+4]

ret

}

}

int main() {

float a,b,c;

scanf("%f",&a);

scanf("%f",&b);

scanf("%f",&c);

printf("FMA:%f-%f*%f=%f\n",c,a,b,vfnmaddss(a,b,c));

printf("x87:%f-%f*%f=%f\n",c,a,b,-(a*b)+c);

}

Link Copied

5 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

msvc:

x87:10000.000000-3183.000000*3.141593=0.310305

It's hard to tell which one is right, the input value of PI doesn't exactlyfit into a float, and we cannot see the exact results either because printf rounds not just PI (b) but the results too.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

(apologies for the delayed response. I was just notified about this posting.)

Hi, the different answers are because the FMA is using a fused multiply-add without an internal rounding step. On x87, you have an "extra" round between the multiply and the add. If you have a compiler that supports posix fused fma routine called "fmaf" for single precision, you'll see that you would get the answer that the Intel SDE FMA produces.

% cat fma44.c#include#includeint main() {float a,b,c,d;b = -3183;c = 3.14159265;d = 10000;a = fmaf(b,c,d);printf("%fn",a);return 0;}% icc -o fma44 fma44.c -lm% ./fma440.310305

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hm, this does not explain why msvc ended up with the same results without fma.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi, For this test, MSVC uses SSE on Intel64 but uses x87 on IA32. If do the computation on IA32 using x87floating point hardware thenyouget answer that the fused fma gives since multiplying two 24bfractions will result in 48b fraction and that will easily fit in the 64b fraction of the 80b float. In this case, I believe Windows defaults to using double precision for the x87 floating point stack. Either way, the 53b fraction is sufficient to hold the exact product that would be used in the fused multiply-add.

If you use MSVC on Intel64 and thus use SSE single precision, then it gets the other answer.

Not surprisingly when there is approximate numerical representations of the input values and numerical cancelation like this, truedouble precision gives yet a thirdanswer.

Regards,

Mark

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page