<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Floating point reproducibility across devices in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019724#M3238</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;Is there a way to get reproducible float results from kernels across all devices?&lt;/P&gt;

&lt;P&gt;I'm running an OpenCL kernel on different devices (different CPUs and GPUs), and the computation results for floats differ.&lt;/P&gt;

&lt;P&gt;On a system with an i7-3770 CPU, its integrated Intel HD 4000 GPU, and an AMD Capeverde GPU, all possible combinations of OpenCL platform (AMD or Intel) and device lead to bit-by-bit identical results.&lt;/P&gt;

&lt;P&gt;On another system with an i3-4010U CPU (with integrated HD 4400 GPU), the Intel OpenCL platform on the GPU produces the same results as the first system, but for the CPU, results differ.&lt;/P&gt;

&lt;P&gt;The kernel compiler command line is always "-cl-fp32-correctly-rounded-divide-sqrt".&lt;/P&gt;</description>
    <pubDate>Fri, 08 Aug 2014 16:30:54 GMT</pubDate>
    <dc:creator>Lukas_E_1</dc:creator>
    <dc:date>2014-08-08T16:30:54Z</dc:date>
    <item>
      <title>Floating point reproducibility across devices</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019724#M3238</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;Is there a way to get reproducible float results from kernels across all devices?&lt;/P&gt;

&lt;P&gt;I'm running an OpenCL kernel on different devices (different CPUs and GPUs), and the computation results for floats differ.&lt;/P&gt;

&lt;P&gt;On a system with an i7-3770 CPU, its integrated Intel HD 4000 GPU, and an AMD Capeverde GPU, all possible combinations of OpenCL platform (AMD or Intel) and device lead to bit-by-bit identical results.&lt;/P&gt;

&lt;P&gt;On another system with an i3-4010U CPU (with integrated HD 4400 GPU), the Intel OpenCL platform on the GPU produces the same results as the first system, but for the CPU, results differ.&lt;/P&gt;

&lt;P&gt;The kernel compiler command line is always "-cl-fp32-correctly-rounded-divide-sqrt".&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2014 16:30:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019724#M3238</guid>
      <dc:creator>Lukas_E_1</dc:creator>
      <dc:date>2014-08-08T16:30:54Z</dc:date>
    </item>
    <item>
      <title>Hi Lukas,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019725#M3239</link>
      <description>&lt;P&gt;Hi Lukas,&lt;/P&gt;

&lt;P&gt;Can you please provide a minimal reproducer for the problem?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Arik&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2014 06:46:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019725#M3239</guid>
      <dc:creator>Arik_N_Intel</dc:creator>
      <dc:date>2014-08-13T06:46:28Z</dc:date>
    </item>
    <item>
      <title>Hi Arik,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019726#M3240</link>
      <description>&lt;P&gt;Hi Arik,&lt;/P&gt;

&lt;P&gt;sorry for the delay. Here is the kernel code where we're seeing the issue:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;__kernel void warpCoord(__global float2 *restrict outputPos,
                        const float4 A,       // fx, fy, cx, cy
                        const float4 coeff_1, // k1, k2, p1, p2
                        const float4 coeff_2, // k3, k4, k5, k6
                        const float4 ir_r1, const float4 ir_r2,
                        const float4 ir_r3)
{
    int4 iSP;
    iSP.s0 = get_global_id(0);
    iSP.s1 = get_global_id(1);
    iSP.s2 = get_global_size(0); // frame width
    iSP.s3 = get_global_size(1); // frame height

    float3 pos = (float3)(iSP.x * ir_r1.x + iSP.y * ir_r1.y + ir_r1.z,
                          iSP.x * ir_r2.x + iSP.y * ir_r2.y + ir_r2.z,
                          iSP.x * ir_r3.x + iSP.y * ir_r3.y + ir_r3.z);
    pos /= pos.z;

    float x2 = pos.x * pos.x;
    float y2 = pos.y * pos.y;
    float r2 = x2 + y2;
    float _2xy = 2 * pos.x * pos.y;
    float kr = (1 + ((coeff_2.s0 * r2 + coeff_1.s1) * r2 + coeff_1.s0) * r2) /
               (1 + ((coeff_2.s3 * r2 + coeff_2.s2) * r2 + coeff_2.s1) * r2);
    float2 pos_dist = (float2)(
        A.s0 * (pos.x * kr + coeff_1.s2 * _2xy + coeff_1.s3 * (r2 + 2 * x2)) +
            A.s2,
        A.s1 * (pos.y * kr + coeff_1.s2 * (r2 + 2 * y2) + coeff_1.s3 * _2xy) +
            A.s3);

    float2 fDP = (float2)(pos_dist.x, pos_dist.y);

    outputPos[iSP.s0 + iSP.s1 * iSP.s2] = fDP;
}&lt;/PRE&gt;

&lt;P&gt;The kernel arguments to reproduce this are:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;A = {1, 1, 0, 0}&lt;/LI&gt;
	&lt;LI&gt;coeff_1 = {0, 0, 0, 0}&lt;/LI&gt;
	&lt;LI&gt;coeff_2 = {0, 0, 0, 0}&lt;/LI&gt;
	&lt;LI&gt;ir_r1 = {1, 0, 0, 0}&lt;/LI&gt;
	&lt;LI&gt;ir_r2 = {0, 1, 0, 0}&lt;/LI&gt;
	&lt;LI&gt;ir_r3 = {0, 0, 1, 0}&lt;/LI&gt;
	&lt;LI&gt;outputPos is a buffer with global_size(0) * global_size(1) * sizeof(cl_float2) bytes&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The expected output is for each element in outputPos to contain it's 2D position in the array.&lt;/P&gt;</description>
      <pubDate>Mon, 15 Sep 2014 10:52:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Floating-point-reproducibility-across-devices/m-p/1019726#M3240</guid>
      <dc:creator>Lukas_E_1</dc:creator>
      <dc:date>2014-09-15T10:52:18Z</dc:date>
    </item>
  </channel>
</rss>

