OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1719 Discussions

mad() for float always returns 0

sschuberth
Beginner
793 Views
Hi,

after manually converting most multiply-add expressions in my kernel to mad() calls for testing, the Intel OpenCL implementation started to return wrong results, although the implementations by other vendors work fine. Try this code in the offline compiler tool coming with SDK 1.1.0.0 (64-bit):

__kernel void a(int2 in_res,float2 pos)
{
int2 coord=convert_int2(pos);
float2 t=pos-convert_float2(coord),s=1.0f-t;

//volatile float value=mad(t.x,t.y,s.x);
volatile float value=t.x*t.y+s.x;
}

The above will create correct code, something like

_a: # @a
# BB#0:
sub RSP, 36
movq XMM0, QWORD PTR [RSP + 84]
cvttps2dq XMM1, XMM0
cvtdq2ps XMM1, XMM1
subps XMM0, XMM1
movss XMM1, DWORD PTR [RIP + LCPI3_0]
subss XMM1, XMM0
pshufd XMM2, XMM0, 1
mulss XMM2, XMM0
addss XMM2, XMM1
movss DWORD PTR [RSP + 32], XMM2
add RSP, 36
ret

for the non-vectorized version. If you use the commented-out mad() call instead, I get this (again for the non-vectorized version):

_a: # @a
# BB#0:
sub RSP, 36
mov DWORD PTR [RSP + 32], 0
add RSP, 36
ret

So a constant value of 0 is written to "value", which seems quite wrong to me ;-)

The auto-vectorized versions suffer from the same bug, by the way.

PS: Interestingly, passing "-cl-mad-enable" to the compiler does not change anything, it does not make the issue appear for the first version.
0 Kudos
2 Replies
sschuberth
Beginner
793 Views
Hm, no reply yet? Can any Intel official reproduce the issue?
0 Kudos
Boaz_O_Intel
Employee
793 Views
Hey,

Thanks for the detailed report. We will investigate the issue and fix it.

Boaz
0 Kudos
Reply