<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Wrong code created by auto-vectorizer? in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805935#M833</link>
    <description>Hi sschuberth,&lt;BR /&gt;Are you still seeing the issue with the auto-vectorizer?&lt;BR /&gt;If so, can you send us a minimal reproducing example?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Sion</description>
    <pubDate>Sun, 01 May 2011 12:36:43 GMT</pubDate>
    <dc:creator>Sion_Berkowits1</dc:creator>
    <dc:date>2011-05-01T12:36:43Z</dc:date>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805930#M828</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;with the current SDK version 1.1.0.10515, 64-bit, running on Vista x64 on a Core i7 920, I'm getting wrong results with one of my kernels if I do a trivial change to enable the compiler to perform auto-vectorization. Here's the original kernel code which the compiler is unable to auto-vectorize:&lt;BR /&gt;&lt;BR /&gt;__constant sampler_t sampler=&lt;BR /&gt; CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_CLAMP|CLK_FILTER_LINEAR;&lt;BR /&gt;&lt;BR /&gt;__kernel void backproject(&lt;BR /&gt; // Stripped some code here ...&lt;BR /&gt; __read_only image2d_t input,&lt;BR /&gt; __global float* output)&lt;BR /&gt;{&lt;BR /&gt; // Stripped some code here ...&lt;BR /&gt;&lt;BR /&gt; float4 value=read_imagef(input,sampler,(float2)(un+0.5f,vn+0.5f));&lt;BR /&gt;&lt;BR /&gt; // Comment-out the below "if" statement to enable auto-vectorization with Intel OpenCL.&lt;BR /&gt; if (value.x)&lt;BR /&gt; {&lt;BR /&gt; int out_x=get_global_size(0);&lt;BR /&gt; int out_y=get_global_size(1);&lt;BR /&gt; output[z*out_y*out_x+y*out_x+x]+=native_recip(wn*wn)*value.x;&lt;BR /&gt; }&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;The code above compiles fine, though with the note "Kernel &lt;BACKPROJECT&gt; was not vectorized", and gives the correct result. If I now remove the conditional branch by commenting out the marked "if" statement like this:&lt;BR /&gt;&lt;BR /&gt;__constant sampler_t sampler=&lt;BR /&gt; CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_CLAMP|CLK_FILTER_LINEAR;&lt;BR /&gt;&lt;BR /&gt;__kernel void backproject(&lt;BR /&gt; // Stripped some code here ...&lt;BR /&gt; __read_only image2d_t input,&lt;BR /&gt; __global float* output)&lt;BR /&gt;{&lt;BR /&gt; // Stripped some code here ...&lt;BR /&gt;&lt;BR /&gt; float4 value=read_imagef(input,sampler,(float2)(un+0.5f,vn+0.5f));&lt;BR /&gt;&lt;BR /&gt; // Comment-out the below "if" statement to enable auto-vectorization with Intel OpenCL.&lt;BR /&gt; //if (value.x)&lt;BR /&gt; {&lt;BR /&gt; int out_x=get_global_size(0);&lt;BR /&gt; int out_y=get_global_size(1);&lt;BR /&gt; output[z*out_y*out_x+y*out_x+x]+=native_recip(wn*wn)*value.x;&lt;BR /&gt; }&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Then the code compiles with the note "Kernel &lt;BACKPROJECT&gt; was successfully vectorized", but it gives the wrong result.&lt;BR /&gt;&lt;BR /&gt;I've looked at the Assembler generated by the latter kernel. The part where the image is read looks like&lt;BR /&gt;&lt;BR /&gt; mov RBX, QWORD PTR [RCX + 16]&lt;BR /&gt; mov R14, QWORD PTR [RSP + 368]&lt;BR /&gt; mov R15, QWORD PTR [RSP + 304]&lt;BR /&gt; mov R12D, 17&lt;BR /&gt; mov RCX, R15&lt;BR /&gt; mov EDX, 17&lt;BR /&gt; movapd XMM2, XMM0&lt;BR /&gt; call __Z11read_imagefP10_image2d_tjU8__vector2f&lt;BR /&gt; movaps XMM10, XMM0&lt;BR /&gt; movaps XMM0, XMM7&lt;BR /&gt; shufps XMM0, XMM8, 17&lt;BR /&gt; pshufd XMM0, XMM0, 8&lt;BR /&gt; mov RCX, R15&lt;BR /&gt; mov EDX, 17&lt;BR /&gt; movapd XMM2, XMM0&lt;BR /&gt; call __Z11read_imagefP10_image2d_tjU8__vector2f&lt;BR /&gt; movlhps XMM10, XMM0&lt;BR /&gt; movaps XMM0, XMM7&lt;BR /&gt; unpckhps XMM0, XMM8&lt;BR /&gt; mov RCX, R15&lt;BR /&gt; mov EDX, 17&lt;BR /&gt; movapd XMM2, XMM0&lt;BR /&gt; call __Z11read_imagefP10_image2d_tjU8__vector2f&lt;BR /&gt; movaps XMM11, XMM0&lt;BR /&gt; shufps XMM7, XMM8, 51&lt;BR /&gt; pshufd XMM0, XMM7, 8&lt;BR /&gt; mov RCX, R15&lt;BR /&gt; mov EDX, R12D&lt;BR /&gt; movapd XMM2, XMM0&lt;BR /&gt; call __Z11read_imagefP10_image2d_tjU8__vector2f&lt;BR /&gt; movlhps XMM11, XMM0&lt;BR /&gt; shufps XMM10, XMM11, -120&lt;BR /&gt; mulps XMM9, XMM9&lt;BR /&gt; rcpps XMM0, XMM9&lt;BR /&gt; mulps XMM0, XMM10&lt;BR /&gt; add EDI, ESI&lt;BR /&gt; add EBX, R14D&lt;BR /&gt; mov RAX, QWORD PTR [RSP + 328]&lt;BR /&gt; imul EBX, DWORD PTR [RAX + 40]&lt;BR /&gt; add EBX, EDI&lt;BR /&gt; imul EBX, DWORD PTR [RAX + 32]&lt;BR /&gt;&lt;BR /&gt;I believe the problem is that I'm uploading new data to the image object between host calls to backproject(), i.e. all calls to backproject() operate on different image data (though on the same image object). But in the vectorized version, the same image data is read four times because the kernel does not return to the host between images reads to get the image data uploaded.&lt;BR /&gt;&lt;BR /&gt;I know that I can explicitly disable auto-vectorization by prefixing the kernel definition by&lt;BR /&gt;&lt;BR /&gt;__kernel __attribute__((vec_type_hint(float4)))&lt;BR /&gt;&lt;BR /&gt;but for compatibility to other OpenCL implementations I believe that the default should just work and only safe auto-vectorizations should be done, although I have no clue how to define "safe" as the OpenCL compiler does not know the host code.&lt;BR /&gt;&lt;/BACKPROJECT&gt;&lt;/BACKPROJECT&gt;</description>
      <pubDate>Mon, 07 Feb 2011 17:55:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805930#M828</guid>
      <dc:creator>sschuberth</dc:creator>
      <dc:date>2011-02-07T17:55:22Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805931#M829</link>
      <description>Hi,&lt;BR /&gt;Thanks for bringing up this issue.&lt;BR /&gt;We would like to try and reproduce the problem you are experiencing. Would it be possible for you to post the entire kernel code, or some reduced version of the kernel which still reproduces the problem?&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;BR /&gt;Sion</description>
      <pubDate>Mon, 14 Feb 2011 08:31:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805931#M829</guid>
      <dc:creator>Sion_Berkowits1</dc:creator>
      <dc:date>2011-02-14T08:31:04Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805932#M830</link>
      <description>I'll try to come up with a minimal reproducing example in a few days.&lt;BR /&gt;&lt;BR /&gt;Note, however, that what I initially believed to be the reason for the issue is not a problem: The loop-unrolling affects work-group-level parallelism, so work that would have been done by separate work-items within a work-group is moved into a work-item, and the number of work-items within a work-group is dynamically reduced (I believe the "*_Vectorized" version is only called if the local X work-size as passed to clEnqueueNDRangeKernel() is a multiple of 4). As one cannot modify image data in between runs of different work-items within a work-group, but only in between calls to clEnqueueNDRangeKernel(), it &lt;B&gt;should&lt;/B&gt; work. But still, I'm getting wrong results.</description>
      <pubDate>Mon, 14 Feb 2011 14:08:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805932#M830</guid>
      <dc:creator>sschuberth</dc:creator>
      <dc:date>2011-02-14T14:08:31Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805933#M831</link>
      <description>Thanks.&lt;BR /&gt;Have you tried turning the Vectorization on/off with the vec_hint_type, instead of unmasking the IF statement?&lt;BR /&gt;It may provide more insight, as it does not modify the actually executed code&lt;BR /&gt;</description>
      <pubDate>Mon, 14 Feb 2011 14:17:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805933#M831</guid>
      <dc:creator>Sion_Berkowits1</dc:creator>
      <dc:date>2011-02-14T14:17:59Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805934#M832</link>
      <description>I just tried, but the generated code (vectorized vs. non-vectorized using vec_hint_type) still is to complex for me to see through ...</description>
      <pubDate>Mon, 14 Feb 2011 14:30:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805934#M832</guid>
      <dc:creator>sschuberth</dc:creator>
      <dc:date>2011-02-14T14:30:23Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805935#M833</link>
      <description>Hi sschuberth,&lt;BR /&gt;Are you still seeing the issue with the auto-vectorizer?&lt;BR /&gt;If so, can you send us a minimal reproducing example?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Sion</description>
      <pubDate>Sun, 01 May 2011 12:36:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805935#M833</guid>
      <dc:creator>Sion_Berkowits1</dc:creator>
      <dc:date>2011-05-01T12:36:43Z</dc:date>
    </item>
    <item>
      <title>Wrong code created by auto-vectorizer?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805936#M834</link>
      <description>Hi Sion,&lt;BR /&gt;&lt;BR /&gt;sorry for the delay. I just tried the OpenCL SDK 1.1 build 12772. For the auto-vectorized code I still get wrong results, but something has changed since build 10515, it's closer to the correct result now, but it's still wrong.&lt;BR /&gt;&lt;BR /&gt;It's hard for me to create a minimal example as my program is embedded into a huge framework and depends on large input data. But I'll see what I can do.&lt;BR /&gt;</description>
      <pubDate>Mon, 09 May 2011 12:27:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Wrong-code-created-by-auto-vectorizer/m-p/805936#M834</guid>
      <dc:creator>sschuberth</dc:creator>
      <dc:date>2011-05-09T12:27:06Z</dc:date>
    </item>
  </channel>
</rss>

