<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic When you already have in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780011#M251</link>
    <description>When you already have vectorized the code with intrinsics, the compiler cannot auto-vectorize it: You already have done the vectorization. If you want to investigate in auto-vectorization, you have to go back to the scalar version.</description>
    <pubDate>Sun, 09 Sep 2012 19:59:44 GMT</pubDate>
    <dc:creator>Thomas_W_Intel</dc:creator>
    <dc:date>2012-09-09T19:59:44Z</dc:date>
    <item>
      <title>Problems encountred during vectorization of code using SSE intrinsics</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780010#M250</link>
      <description>&lt;P&gt;I have been struggling with vectorizing a particular application for 
sometime now and I have tried everything. From autovectorization, to 
handcoded SSE intrinsics. But somehow I am unable to obtain speedup on 
my stencil based application.&lt;/P&gt;

Following is a snippet of my current code, which I have vectorized using SSE intrinsics.&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="prettyprint lang-c"&gt;&lt;CODE&gt;//#pragma ivdep&lt;BR /&gt; for ( i = STENCIL; i &amp;lt; z - STENCIL; i+=4 )&lt;BR /&gt; {&lt;BR /&gt;  it = it2 + i;&lt;BR /&gt;  __m128 center = _mm_mul_ps(_mm_load_ps(&amp;amp;p2[it]),C00_i);&lt;BR /&gt;&lt;BR /&gt;  u_j4 = _mm_load_ps(&amp;amp;p2[i+j*it_j-it_j4+k*it_k]); //Line 180&lt;BR /&gt;  u_j3 = _mm_load_ps(&amp;amp;p2[i+j*it_j-it_j3+k*it_k]);&lt;BR /&gt;  u_j2 = _mm_load_ps(&amp;amp;p2[i+j*it_j-it_j2+k*it_k]);&lt;BR /&gt;  u_j1 = _mm_load_ps(&amp;amp;p2[i+j*it_j-it_j +k*it_k]);&lt;BR /&gt;  u_j8 = _mm_load_ps(&amp;amp;p2[i+j*it_j+it_j4+k*it_k]);&lt;BR /&gt;  u_j7 = _mm_load_ps(&amp;amp;p2[i+j*it_j+it_j3+k*it_k]);&lt;BR /&gt;  u_j6 = _mm_load_ps(&amp;amp;p2[i+j*it_j+it_j2+k*it_k]);&lt;BR /&gt;  u_j5 = _mm_load_ps(&amp;amp;p2[i+j*it_j+it_j +k*it_k]);&lt;BR /&gt;&lt;BR /&gt;  __m128 tmp2i = _mm_mul_ps(_mm_add_ps(u_j4,u_j8),X4_i);&lt;BR /&gt;  __m128 tmp3 = _mm_mul_ps(_mm_add_ps(u_j3,u_j7),X3_i);&lt;BR /&gt;  __m128 tmp4 = _mm_mul_ps(_mm_add_ps(u_j2,u_j6),X2_i);&lt;BR /&gt;  __m128 tmp5 = _mm_mul_ps(_mm_add_ps(u_j1,u_j5),X1_i);&lt;BR /&gt;&lt;BR /&gt;  __m128 tmp6 = _mm_add_ps(_mm_add_ps(tmp2i,tmp3),_mm_add_ps(tmp4,tmp5));&lt;BR /&gt;  __m128 tmp7 = _mm_add_ps(tmp6,center);&lt;BR /&gt;&lt;BR /&gt;  _mm_store_ps(&amp;amp;tmp2&lt;I&gt;,tmp7); //Line 196&lt;BR /&gt;&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/CODE&gt;&lt;I&gt;&lt;P&gt;When I compile (icc) the above code without &lt;CODE&gt;#pragma ivdep&lt;/CODE&gt; I get the following message:
remark: loop was not vectorized: existence of vector dependence.
&lt;CODE&gt;vector dependence: assumed FLOW dependence between tmp2 line 196 and tmp2 line 196.&lt;BR /&gt;vector dependence: assumed ANTI dependence between tmp2 line 196 and tmp2 line 196.&lt;/CODE&gt;
&lt;/P&gt;&lt;P&gt;When I compile (icc) it with the &lt;CODE&gt;#pragma ivdep&lt;/CODE&gt;, I get the following message:
&lt;CODE&gt;remark: loop was not vectorized: unsupported data type. //Line 180&lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;Why is there a dependence suggested for Line 196? How can I eliminate the suggested vector dependence?&lt;/P&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Jul 2012 15:14:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780010#M250</guid>
      <dc:creator>priyanka06</dc:creator>
      <dc:date>2012-07-23T15:14:02Z</dc:date>
    </item>
    <item>
      <title>When you already have</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780011#M251</link>
      <description>When you already have vectorized the code with intrinsics, the compiler cannot auto-vectorize it: You already have done the vectorization. If you want to investigate in auto-vectorization, you have to go back to the scalar version.</description>
      <pubDate>Sun, 09 Sep 2012 19:59:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780011#M251</guid>
      <dc:creator>Thomas_W_Intel</dc:creator>
      <dc:date>2012-09-09T19:59:44Z</dc:date>
    </item>
    <item>
      <title>Consider writing your sse</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780012#M252</link>
      <description>Consider writing your sse intrinsics to use fewer temps and to interleave the loads with the multiplys

Jim Dempsey</description>
      <pubDate>Mon, 10 Sep 2012 00:15:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Problems-encountred-during-vectorization-of-code-using-SSE/m-p/780012#M252</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2012-09-10T00:15:07Z</dc:date>
    </item>
  </channel>
</rss>

