<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Robert, in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086870#M4814</link>
    <description>&lt;P&gt;Hi Robert,&lt;/P&gt;

&lt;P&gt;I tried your suggestions but every values in the output buffer are still sin() values.&lt;/P&gt;

&lt;P&gt;One way we could obtain these values with cos() would be doing cos(Pi/2 - x) or cos(x + 3Pi/4) but the spir and assembly generated through CodeBuilder (ioc64 ?) looks fine to me (but i'm definitely not familiar with llvm, spir nor hd4600 asm...).&lt;/P&gt;

&lt;P&gt;I tried retrieving the binary of the kernel after it has been built from my app with clGetProgramInfo( ... CL_PROGRAM_BINARIES...) to verify it was similar to the offline compiler output but it's binary so I can't really parse it... I attached the file if this can help (added .txt to the name to allow me to upload it).&lt;/P&gt;

&lt;P&gt;Modified buffer setup:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    INTEL_COUNT = 1024;
    
    intel_bug_input_buffer  = cl::Buffer {context, CL_MEM_READ_ONLY, sizeof(float) * INTEL_COUNT};
    intel_bug_output_buffer = cl::Buffer {context, CL_MEM_WRITE_ONLY, sizeof(float) * INTEL_COUNT};

    intel_bug_input = new float[INTEL_COUNT];
    intel_bug_output = new float[INTEL_COUNT];
    for (int i = 0; i &amp;lt; INTEL_COUNT; ++i) {
        intel_bug_input&lt;I&gt; = 1;
        intel_bug_output&lt;I&gt; = 0;
    }&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;And modified calling code:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    queue.enqueueWriteBuffer(intel_bug_input_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_input);
    cl::make_kernel&amp;lt;cl::Buffer&amp;amp;, cl::Buffer&amp;amp;&amp;gt; intel_bug(render.prog, "test_bug");
    cl::EnqueueArgs enqueueArgs1(queue, cl::NDRange(INTEL_COUNT));
    intel_bug(enqueueArgs1, intel_bug_input_buffer, intel_bug_output_buffer).wait();
    queue.enqueueReadBuffer(intel_bug_output_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_output);
    for (int i = 0; i &amp;lt; INTEL_COUNT; i += 128) {
        printf("[%d]: cos(%f) returned %f\n", i, intel_bug_input&lt;I&gt;, intel_bug_output&lt;I&gt;);
    }&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 26 Feb 2016 02:29:00 GMT</pubDate>
    <dc:creator>Oueoue__Teybeo</dc:creator>
    <dc:date>2016-02-26T02:29:00Z</dc:date>
    <item>
      <title>cos() returns sin() values with -fast-relaxed-math</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086868#M4812</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;My system is&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;OS: Windows 7 64bits
CPU: i7-4720HQ
iGPU: HD 4600
Graphics driver: 15.38.28.4332 and 15.36.24.64.4264
OpenCL SDK: 2016 revision&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;Problem: when compiled with -fast-relaxed-math, the cos() function behaves as the sin() function on HD4600.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Minimal reproducer code:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;void kernel test_bug(global float* input, global float* output) {

    float x = input[0];
    float cos_rand = cos(x);
    output[0] = cos_rand;
}&lt;/PRE&gt;

&lt;P&gt;The buffers initialization (storing only 1 float):&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    intel_bug_input_buffer&amp;nbsp; = cl::Buffer {context, CL_MEM_READ_ONLY, sizeof(float)};
&amp;nbsp;&amp;nbsp;&amp;nbsp; intel_bug_output_buffer = cl::Buffer {context, CL_MEM_WRITE_ONLY, sizeof(float)};
    intel_bug_input = new float[1];
    intel_bug_input[0] = 1;
    intel_bug_output = new float[1];
    intel_bug_output[0] = 0;&lt;/PRE&gt;

&lt;P&gt;The calling host code:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    queue.enqueueWriteBuffer(intel_bug_input_buffer, CL_TRUE, 0, sizeof(float), intel_bug_input);
    cl::make_kernel&amp;lt;cl::Buffer&amp;amp;, cl::Buffer&amp;amp;&amp;gt; intel_bug(render.prog, "test_bug");
    cl::EnqueueArgs enqueueArgs1(queue, cl::NDRange(1, 1));
    intel_bug(enqueueArgs1, intel_bug_input_buffer, intel_bug_output_buffer).wait();
    queue.enqueueReadBuffer(intel_bug_output_buffer, CL_TRUE, 0, sizeof(float), intel_bug_output);
    printf("cos(%f) returned %f\n", intel_bug_input[0], intel_bug_output[0]);&lt;/PRE&gt;

&lt;P&gt;Testing with different input values and different devices gives the following results:&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;HD4600&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;cos(0.000000) returned 0.000000
cos(1.000000) returned 0.841489
cos(2.000000) returned 0.909297
cos(3.140000) returned 0.001593&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;i7-4720HQ&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;cos(0.000000) returned 1.000000
cos(1.000000) returned 0.540302
cos(2.000000) returned -0.416147
cos(3.140000) returned -0.999999&lt;/PRE&gt;

&lt;P&gt;&lt;STRONG&gt;Nvidia 965M&lt;/STRONG&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;cos(0.000000) returned 1.000000
cos(1.000000) returned 0.540302
cos(2.000000) returned -0.416147
cos(3.140000) returned -0.999999&lt;/PRE&gt;

&lt;P&gt;I tried using OpenCL C native_cos() instead and looking at the generated asm with CodeBuilder, I found a strange thing:&lt;/P&gt;

&lt;P&gt;With -fast-relaxed-math, the cos() incorrectly returns sin() values and the native_cos() returns correct values. But the generated assembly is exactly the same ! It uses "math.cos" asm instructions. Can execution be different with the same assembly ? Unfortunately I was looking at the code generated by CodeBuilder but had to build and run it with my app (using the exact same kernel) as I couldn't run it in CodeBuilder, so maybe my app actually produced different code.. doubtful but I can't know for sure.&lt;/P&gt;

&lt;P&gt;Whitout the flag, both are correct; the native_cos() still generates "math.cos" asm instructions and the cos() generates hundreds of instructions.&lt;/P&gt;

&lt;P&gt;I could see how one could have mismatch a cos by a sin in the driver seeing how they are related but I can't believe I'm the only one using a cos() with -fast-relaxed-math on this hardware/software config...&lt;/P&gt;

&lt;P&gt;PS: I tried but no, sin() correctly returns sin() values...&lt;/P&gt;

&lt;P&gt;PPS: attached is the asm generated for this kernel. It is exactly the same for the 3 [native_cos] / [native_cos+fast-relaxed-math] / [cos+fast-relaxed-math configurations].&lt;/P&gt;</description>
      <pubDate>Wed, 24 Feb 2016 22:38:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086868#M4812</guid>
      <dc:creator>Oueoue__Teybeo</dc:creator>
      <dc:date>2016-02-24T22:38:12Z</dc:date>
    </item>
    <item>
      <title>Hi Teybeo,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086869#M4813</link>
      <description>&lt;P&gt;Hi Teybeo,&lt;/P&gt;

&lt;P&gt;I tried really hard to reproduce the issue on my IVB, HSW and BDW boxes running Windows 8.1 and Windows 10 without any success. Everything appears to be working properly.&lt;/P&gt;

&lt;P&gt;I wonder what happens when you try the following kernel&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;kernel void&amp;nbsp; test_bug(global float* input, global float* output) {
&amp;nbsp;&amp;nbsp;&amp;nbsp; int i = get_global_id(0);
&amp;nbsp;&amp;nbsp;&amp;nbsp; float x = input&lt;I&gt;;
&amp;nbsp;&amp;nbsp;&amp;nbsp; float cos_rand = cos(x);
&amp;nbsp;&amp;nbsp;&amp;nbsp; output&lt;I&gt; = cos_rand;
}
&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;with buffer sizes and nd ranges of size 1024?&lt;/P&gt;

&lt;P&gt;I consulted with our driver architect, he looked at the underlying cos code and stated that it is impossible to return sin values.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Feb 2016 20:16:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086869#M4813</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2016-02-25T20:16:11Z</dc:date>
    </item>
    <item>
      <title>Hi Robert,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086870#M4814</link>
      <description>&lt;P&gt;Hi Robert,&lt;/P&gt;

&lt;P&gt;I tried your suggestions but every values in the output buffer are still sin() values.&lt;/P&gt;

&lt;P&gt;One way we could obtain these values with cos() would be doing cos(Pi/2 - x) or cos(x + 3Pi/4) but the spir and assembly generated through CodeBuilder (ioc64 ?) looks fine to me (but i'm definitely not familiar with llvm, spir nor hd4600 asm...).&lt;/P&gt;

&lt;P&gt;I tried retrieving the binary of the kernel after it has been built from my app with clGetProgramInfo( ... CL_PROGRAM_BINARIES...) to verify it was similar to the offline compiler output but it's binary so I can't really parse it... I attached the file if this can help (added .txt to the name to allow me to upload it).&lt;/P&gt;

&lt;P&gt;Modified buffer setup:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    INTEL_COUNT = 1024;
    
    intel_bug_input_buffer  = cl::Buffer {context, CL_MEM_READ_ONLY, sizeof(float) * INTEL_COUNT};
    intel_bug_output_buffer = cl::Buffer {context, CL_MEM_WRITE_ONLY, sizeof(float) * INTEL_COUNT};

    intel_bug_input = new float[INTEL_COUNT];
    intel_bug_output = new float[INTEL_COUNT];
    for (int i = 0; i &amp;lt; INTEL_COUNT; ++i) {
        intel_bug_input&lt;I&gt; = 1;
        intel_bug_output&lt;I&gt; = 0;
    }&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;And modified calling code:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;    queue.enqueueWriteBuffer(intel_bug_input_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_input);
    cl::make_kernel&amp;lt;cl::Buffer&amp;amp;, cl::Buffer&amp;amp;&amp;gt; intel_bug(render.prog, "test_bug");
    cl::EnqueueArgs enqueueArgs1(queue, cl::NDRange(INTEL_COUNT));
    intel_bug(enqueueArgs1, intel_bug_input_buffer, intel_bug_output_buffer).wait();
    queue.enqueueReadBuffer(intel_bug_output_buffer, CL_TRUE, 0, sizeof(float) * INTEL_COUNT, intel_bug_output);
    for (int i = 0; i &amp;lt; INTEL_COUNT; i += 128) {
        printf("[%d]: cos(%f) returned %f\n", i, intel_bug_input&lt;I&gt;, intel_bug_output&lt;I&gt;);
    }&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Feb 2016 02:29:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/cos-returns-sin-values-with-fast-relaxed-math/m-p/1086870#M4814</guid>
      <dc:creator>Oueoue__Teybeo</dc:creator>
      <dc:date>2016-02-26T02:29:00Z</dc:date>
    </item>
  </channel>
</rss>

