- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
after manually converting most multiply-add expressions in my kernel to mad() calls for testing, the Intel OpenCL implementation started to return wrong results, although the implementations by other vendors work fine. Try this code in the offline compiler tool coming with SDK 1.1.0.0 (64-bit):
__kernel void a(int2 in_res,float2 pos)
{
int2 coord=convert_int2(pos);
float2 t=pos-convert_float2(coord),s=1.0f-t;
//volatile float value=mad(t.x,t.y,s.x);
volatile float value=t.x*t.y+s.x;
}
The above will create correct code, something like
_a: # @a
# BB#0:
sub RSP, 36
movq XMM0, QWORD PTR [RSP + 84]
cvttps2dq XMM1, XMM0
cvtdq2ps XMM1, XMM1
subps XMM0, XMM1
movss XMM1, DWORD PTR [RIP + LCPI3_0]
subss XMM1, XMM0
pshufd XMM2, XMM0, 1
mulss XMM2, XMM0
addss XMM2, XMM1
movss DWORD PTR [RSP + 32], XMM2
add RSP, 36
ret
for the non-vectorized version. If you use the commented-out mad() call instead, I get this (again for the non-vectorized version):
_a: # @a
# BB#0:
sub RSP, 36
mov DWORD PTR [RSP + 32], 0
add RSP, 36
ret
So a constant value of 0 is written to "value", which seems quite wrong to me ;-)
The auto-vectorized versions suffer from the same bug, by the way.
PS: Interestingly, passing "-cl-mad-enable" to the compiler does not change anything, it does not make the issue appear for the first version.
after manually converting most multiply-add expressions in my kernel to mad() calls for testing, the Intel OpenCL implementation started to return wrong results, although the implementations by other vendors work fine. Try this code in the offline compiler tool coming with SDK 1.1.0.0 (64-bit):
__kernel void a(int2 in_res,float2 pos)
{
int2 coord=convert_int2(pos);
float2 t=pos-convert_float2(coord),s=1.0f-t;
//volatile float value=mad(t.x,t.y,s.x);
volatile float value=t.x*t.y+s.x;
}
The above will create correct code, something like
_a: # @a
# BB#0:
sub RSP, 36
movq XMM0, QWORD PTR [RSP + 84]
cvttps2dq XMM1, XMM0
cvtdq2ps XMM1, XMM1
subps XMM0, XMM1
movss XMM1, DWORD PTR [RIP + LCPI3_0]
subss XMM1, XMM0
pshufd XMM2, XMM0, 1
mulss XMM2, XMM0
addss XMM2, XMM1
movss DWORD PTR [RSP + 32], XMM2
add RSP, 36
ret
for the non-vectorized version. If you use the commented-out mad() call instead, I get this (again for the non-vectorized version):
_a: # @a
# BB#0:
sub RSP, 36
mov DWORD PTR [RSP + 32], 0
add RSP, 36
ret
So a constant value of 0 is written to "value", which seems quite wrong to me ;-)
The auto-vectorized versions suffer from the same bug, by the way.
PS: Interestingly, passing "-cl-mad-enable" to the compiler does not change anything, it does not make the issue appear for the first version.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hm, no reply yet? Can any Intel official reproduce the issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey,
Thanks for the detailed report. We will investigate the issue and fix it.
Boaz
Thanks for the detailed report. We will investigate the issue and fix it.
Boaz
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page