<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Scalar Loop : consider using SIMD directive. in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalar-Loop-consider-using-SIMD-directive/m-p/1162577#M7956</link>
    <description>&lt;P&gt;I have a code for BiCCG sparse matrix solver which I have tried to parallelise using OpenMP routine. The snippet can be found using gist:&amp;nbsp;&lt;A href="https://gist.github.com/data-panda/079cfb076092a5289945c9b3b0881fa9"&gt;https://gist.github.com/data-panda/079cfb076092a5289945c9b3b0881fa9&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I tried to use the Intel Advisor and my 55% of the total compute time is spent on the matrix solver, most of whose loops are not vectorized properly due to which I am getting bad scaleup. The most intesive&amp;nbsp;loops which are not vectorised properly lie in the function iterate_hyd_p() : loop 3 (24 %) , loop 1 (20%) , loop 4 (16%) , loop 2 (16 %), loop 6 ( 5%), loop 5 ( 2.5 %, only loop that is autovectorized) . On digging into the diagnostics nearly all loops have a common suggestion of &lt;STRONG&gt;underutilization of FMA instructions&lt;/STRONG&gt; which I guess can be addressed using proper compiler flags. Even the vectorized loop also suffers from this. But majorly the second suggestion/problem with all the non&amp;nbsp;vectorized loops are (common to all) all have a reference to line&amp;nbsp;17 (#pragma omp parallel) -&amp;nbsp;&lt;/P&gt;

&lt;P&gt;1. Scalar loop, outer loop was not autovectorized: consider using SIMD directives&lt;/P&gt;

&lt;P&gt;2. Vector dependence prevents vectorization, loop was predicate optimized version 6&lt;/P&gt;

&lt;P&gt;iterate_hyd_p() is generally called until certain convergence criteria is met as you can see from the main calling function&amp;nbsp;&lt;SPAN class="pl-en" style="box-sizing: border-box; color: rgb(111, 66, 193); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;biccg_periodic&lt;/SPAN&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;(&lt;/SPAN&gt;&lt;SPAN class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;double&lt;/SPAN&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt; convg_criteria) &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;Any pointers on what to look for ?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 27 Jul 2018 14:44:53 GMT</pubDate>
    <dc:creator>Aniruddha_P_</dc:creator>
    <dc:date>2018-07-27T14:44:53Z</dc:date>
    <item>
      <title>Scalar Loop : consider using SIMD directive.</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalar-Loop-consider-using-SIMD-directive/m-p/1162577#M7956</link>
      <description>&lt;P&gt;I have a code for BiCCG sparse matrix solver which I have tried to parallelise using OpenMP routine. The snippet can be found using gist:&amp;nbsp;&lt;A href="https://gist.github.com/data-panda/079cfb076092a5289945c9b3b0881fa9"&gt;https://gist.github.com/data-panda/079cfb076092a5289945c9b3b0881fa9&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I tried to use the Intel Advisor and my 55% of the total compute time is spent on the matrix solver, most of whose loops are not vectorized properly due to which I am getting bad scaleup. The most intesive&amp;nbsp;loops which are not vectorised properly lie in the function iterate_hyd_p() : loop 3 (24 %) , loop 1 (20%) , loop 4 (16%) , loop 2 (16 %), loop 6 ( 5%), loop 5 ( 2.5 %, only loop that is autovectorized) . On digging into the diagnostics nearly all loops have a common suggestion of &lt;STRONG&gt;underutilization of FMA instructions&lt;/STRONG&gt; which I guess can be addressed using proper compiler flags. Even the vectorized loop also suffers from this. But majorly the second suggestion/problem with all the non&amp;nbsp;vectorized loops are (common to all) all have a reference to line&amp;nbsp;17 (#pragma omp parallel) -&amp;nbsp;&lt;/P&gt;

&lt;P&gt;1. Scalar loop, outer loop was not autovectorized: consider using SIMD directives&lt;/P&gt;

&lt;P&gt;2. Vector dependence prevents vectorization, loop was predicate optimized version 6&lt;/P&gt;

&lt;P&gt;iterate_hyd_p() is generally called until certain convergence criteria is met as you can see from the main calling function&amp;nbsp;&lt;SPAN class="pl-en" style="box-sizing: border-box; color: rgb(111, 66, 193); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;biccg_periodic&lt;/SPAN&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;(&lt;/SPAN&gt;&lt;SPAN class="pl-k" style="box-sizing: border-box; color: rgb(215, 58, 73); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;double&lt;/SPAN&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt; convg_criteria) &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(36, 41, 46); font-family: SFMono-Regular, Consolas, &amp;quot;Liberation Mono&amp;quot;, Menlo, Courier, monospace; font-size: 12px; white-space: pre;"&gt;Any pointers on what to look for ?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jul 2018 14:44:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalar-Loop-consider-using-SIMD-directive/m-p/1162577#M7956</guid>
      <dc:creator>Aniruddha_P_</dc:creator>
      <dc:date>2018-07-27T14:44:53Z</dc:date>
    </item>
    <item>
      <title>Hello Aniruddha P.</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalar-Loop-consider-using-SIMD-directive/m-p/1162578#M7957</link>
      <description>&lt;P&gt;Hello&amp;nbsp;Aniruddha P.&lt;/P&gt;&lt;P&gt;I'm not sure, but it seems to me that to vectorize loops you must replace the using of #pragma omp parallel for and #pragma omp for with:&lt;/P&gt;&lt;P&gt;#pragma omp parallel for simd and #pragma omp for simd. Give a&amp;nbsp;try this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks. :)&lt;/P&gt;</description>
      <pubDate>Sat, 04 Apr 2020 16:08:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalar-Loop-consider-using-SIMD-directive/m-p/1162578#M7957</guid>
      <dc:creator>ArthurRatz</dc:creator>
      <dc:date>2020-04-04T16:08:29Z</dc:date>
    </item>
  </channel>
</rss>

