<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Load on ifort/OpenMP with partial parallelization in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868086#M2767</link>
    <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
I suppose the threads are in a spin lock wait loop until they time out, with the objective of accelerating resumption of working threads, and facilitating maintenance of core affinity.&lt;BR /&gt;I don't know whether the KMP_BLOCKTIME waits may be accounted separately, if you run with openmp_profile. You can do that either by re-linking, or, for a default linux dynamic linked openmp library, by setting the profiling shared object in LD_PRELOAD. This collects statistics on parallel regions and writes them in the file guide.gvs.&lt;BR /&gt;</description>
    <pubDate>Fri, 18 Sep 2009 14:03:06 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2009-09-18T14:03:06Z</dc:date>
    <item>
      <title>Load on ifort/OpenMP with partial parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868083#M2764</link>
      <description>Hi all,&lt;BR /&gt;&lt;BR /&gt;am using ifort 11.1.046 on Linux Intel64.&lt;BR /&gt;Just starting to add OpenMP to parallelize simulation code I have:&lt;BR /&gt;The behavior that confuses me is this:&lt;BR /&gt;If I only parallelize a minor component of the code (&amp;lt;5% total runtime) using a fixed number of for example four threads, I would expect that the load for execution of the entire program is around 100%(0.95*1 + 0.05*4). Instead, it is consistently at 400%.&lt;BR /&gt;Load reporting using "top" and automatic fan adjustment agree with this assessment. Timing results (using 'time' or the built-in subroutine CPU_time) all report total user times exactly four times as high as the real execution time. However, the real-time speed-up compared to serial execution is of course negligble. What am I missing?&lt;BR /&gt;&lt;BR /&gt;Andreas&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Sep 2009 09:40:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868083#M2764</guid>
      <dc:creator>gilthe</dc:creator>
      <dc:date>2009-09-18T09:40:39Z</dc:date>
    </item>
    <item>
      <title>Re: Load on ifort/OpenMP with partial parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868084#M2765</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Might your serial intervals be so short that the default settings of KMP_BLOCKTIME keep the threads active?&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Sep 2009 12:43:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868084#M2765</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-09-18T12:43:40Z</dc:date>
    </item>
    <item>
      <title>Re: Load on ifort/OpenMP with partial parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868085#M2766</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; Might your serial intervals be so short that the default settings of KMP_BLOCKTIME keep the threads active?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;thanks, that's extremely helpful!&lt;BR /&gt;&lt;BR /&gt;i mocked with KMP_BLOCKTIME (after at least reading a bit of the doc.) and it does make a difference.&lt;BR /&gt;the execution duration of PARALLEL segments in this example is in fact very short &amp;lt;1 ms. the serial interval is ~4ms.&lt;BR /&gt;KMP_BLOCKTIME seems to default to 200ms which would explain the behavior exactly.&lt;BR /&gt;when i set KMP_BLOCKTIME to 1ms, the load is still too high but signficantly less than the max (and it fluctuates on the timescale it is being recorded on).&lt;BR /&gt;in the case the thread doesn't go to sleep, though: what is it doing? and why is it showing up as load? and what is the point of this deadtime? i assume it would be meant to prevent overhead but in my example performance is unaffected by KMP_BLOCKTIME settings ranging from 1ms to 2s.&lt;BR /&gt;&lt;BR /&gt;thanks again,&lt;BR /&gt;andreas&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Sep 2009 13:04:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868085#M2766</guid>
      <dc:creator>gilthe</dc:creator>
      <dc:date>2009-09-18T13:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: Load on ifort/OpenMP with partial parallelization</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868086#M2767</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
I suppose the threads are in a spin lock wait loop until they time out, with the objective of accelerating resumption of working threads, and facilitating maintenance of core affinity.&lt;BR /&gt;I don't know whether the KMP_BLOCKTIME waits may be accounted separately, if you run with openmp_profile. You can do that either by re-linking, or, for a default linux dynamic linked openmp library, by setting the profiling shared object in LD_PRELOAD. This collects statistics on parallel regions and writes them in the file guide.gvs.&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Sep 2009 14:03:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Load-on-ifort-OpenMP-with-partial-parallelization/m-p/868086#M2767</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-09-18T14:03:06Z</dc:date>
    </item>
  </channel>
</rss>

