<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic vectorization for-loop with data dependency in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/vectorization-for-loop-with-data-dependency/m-p/1227486#M7727</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;
&lt;P&gt;I tried the following simple for-loop with data dependency,&lt;/P&gt;
&lt;P&gt;#pragma omp simd&lt;/P&gt;
&lt;P&gt;for (i = 1; i &amp;lt; 256; ++i) a[i] = 3.125 * a[i-1];&lt;/P&gt;
&lt;P&gt;Using icc with the options (-xCORE-AVX512 -qopt-zmm-usage=high -qopenmp-simd) on Skylake-SP CPU, it seems this for-loop can be vectorized, because instructions vmovups and vmulps are used for data read/write and multiplication, respectively.&lt;/P&gt;
&lt;P&gt;Therefore vectorization may still be possible for some loops with data dependency. Am I correct?&lt;/P&gt;
&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
    <pubDate>Wed, 11 Nov 2020 16:05:34 GMT</pubDate>
    <dc:creator>XinWu</dc:creator>
    <dc:date>2020-11-11T16:05:34Z</dc:date>
    <item>
      <title>vectorization for-loop with data dependency</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/vectorization-for-loop-with-data-dependency/m-p/1227486#M7727</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;
&lt;P&gt;I tried the following simple for-loop with data dependency,&lt;/P&gt;
&lt;P&gt;#pragma omp simd&lt;/P&gt;
&lt;P&gt;for (i = 1; i &amp;lt; 256; ++i) a[i] = 3.125 * a[i-1];&lt;/P&gt;
&lt;P&gt;Using icc with the options (-xCORE-AVX512 -qopt-zmm-usage=high -qopenmp-simd) on Skylake-SP CPU, it seems this for-loop can be vectorized, because instructions vmovups and vmulps are used for data read/write and multiplication, respectively.&lt;/P&gt;
&lt;P&gt;Therefore vectorization may still be possible for some loops with data dependency. Am I correct?&lt;/P&gt;
&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2020 16:05:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/vectorization-for-loop-with-data-dependency/m-p/1227486#M7727</guid>
      <dc:creator>XinWu</dc:creator>
      <dc:date>2020-11-11T16:05:34Z</dc:date>
    </item>
    <item>
      <title>Re: vectorization for-loop with data dependency</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/vectorization-for-loop-with-data-dependency/m-p/1227512#M7728</link>
      <description>&lt;P&gt;I found the problem.&lt;/P&gt;
&lt;P&gt;The compiler may generate vectorized instructions (e.g. vmovups and vmulps) for loop with data dependency, but the calculated numerical results are complete wrong.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2020 17:41:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/vectorization-for-loop-with-data-dependency/m-p/1227512#M7728</guid>
      <dc:creator>XinWu</dc:creator>
      <dc:date>2020-11-11T17:41:10Z</dc:date>
    </item>
  </channel>
</rss>

