- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Is there a way to get reproducible float results from kernels across all devices?
I'm running an OpenCL kernel on different devices (different CPUs and GPUs), and the computation results for floats differ.
On a system with an i7-3770 CPU, its integrated Intel HD 4000 GPU, and an AMD Capeverde GPU, all possible combinations of OpenCL platform (AMD or Intel) and device lead to bit-by-bit identical results.
On another system with an i3-4010U CPU (with integrated HD 4400 GPU), the Intel OpenCL platform on the GPU produces the same results as the first system, but for the CPU, results differ.
The kernel compiler command line is always "-cl-fp32-correctly-rounded-divide-sqrt".
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lukas,
Can you please provide a minimal reproducer for the problem?
Thanks,
Arik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Arik,
sorry for the delay. Here is the kernel code where we're seeing the issue:
__kernel void warpCoord(__global float2 *restrict outputPos, const float4 A, // fx, fy, cx, cy const float4 coeff_1, // k1, k2, p1, p2 const float4 coeff_2, // k3, k4, k5, k6 const float4 ir_r1, const float4 ir_r2, const float4 ir_r3) { int4 iSP; iSP.s0 = get_global_id(0); iSP.s1 = get_global_id(1); iSP.s2 = get_global_size(0); // frame width iSP.s3 = get_global_size(1); // frame height float3 pos = (float3)(iSP.x * ir_r1.x + iSP.y * ir_r1.y + ir_r1.z, iSP.x * ir_r2.x + iSP.y * ir_r2.y + ir_r2.z, iSP.x * ir_r3.x + iSP.y * ir_r3.y + ir_r3.z); pos /= pos.z; float x2 = pos.x * pos.x; float y2 = pos.y * pos.y; float r2 = x2 + y2; float _2xy = 2 * pos.x * pos.y; float kr = (1 + ((coeff_2.s0 * r2 + coeff_1.s1) * r2 + coeff_1.s0) * r2) / (1 + ((coeff_2.s3 * r2 + coeff_2.s2) * r2 + coeff_2.s1) * r2); float2 pos_dist = (float2)( A.s0 * (pos.x * kr + coeff_1.s2 * _2xy + coeff_1.s3 * (r2 + 2 * x2)) + A.s2, A.s1 * (pos.y * kr + coeff_1.s2 * (r2 + 2 * y2) + coeff_1.s3 * _2xy) + A.s3); float2 fDP = (float2)(pos_dist.x, pos_dist.y); outputPos[iSP.s0 + iSP.s1 * iSP.s2] = fDP; }
The kernel arguments to reproduce this are:
- A = {1, 1, 0, 0}
- coeff_1 = {0, 0, 0, 0}
- coeff_2 = {0, 0, 0, 0}
- ir_r1 = {1, 0, 0, 0}
- ir_r2 = {0, 1, 0, 0}
- ir_r3 = {0, 0, 1, 0}
- outputPos is a buffer with global_size(0) * global_size(1) * sizeof(cl_float2) bytes
The expected output is for each element in outputPos to contain it's 2D position in the array.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page