<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to minimize thread creation overhead in Intel Fortran/OpenM in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806537#M40468</link>
    <description>I have tried VTune, but can't get it to work for this pgm. See my post of 6/20, &lt;A title="here" href="http://software.intel.com/en-us/forums/showthread.php?t=106106"&gt;http://software.intel.com/en-us/forums/showthread.php?t=106106&lt;/A&gt;, for details.&lt;BR /&gt;</description>
    <pubDate>Thu, 21 Jun 2012 22:13:59 GMT</pubDate>
    <dc:creator>virtualmemory</dc:creator>
    <dc:date>2012-06-21T22:13:59Z</dc:date>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenMP?</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806528#M40459</link>
      <description>I have a compute-bound Fortran pgm which I am attempting to parallelize using OpenMP. The outline of this pgm is below. The parallel form below takes almost 3 times longer than the serial form, and according to VTune, it is due to thread creation overhead.&lt;BR /&gt;&lt;BR /&gt;I am wondering how I can bring the thread creation outside all the DO-loops, yet still execute all but the iSP loop (the parallel region in the code below) serially.&lt;BR /&gt;&lt;BR /&gt;The outer loops cannot be parallelized because the values of array a at each time depend on values at the preceeding time (this is a PDE with time and position as independent variables). Also, the 'iTry' loop has a conditional EXIT, which is usually taken.&lt;BR /&gt;&lt;BR /&gt;DO iTime=1,nTimes&lt;BR /&gt; ...&lt;BR /&gt; DO iTry=1,nTries&lt;BR /&gt;  ...&lt;BR /&gt;!$OMP PARALLEL&lt;BR /&gt;!$OMP DO&lt;BR /&gt;  DO iSp=1,nSp&lt;BR /&gt;   DO j=1,4000&lt;BR /&gt;  a(iSp,j)=...&lt;BR /&gt; END DO ! j&lt;BR /&gt;  END DO ! iSp&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt; ...&lt;BR /&gt; END DO ! iTry&lt;BR /&gt; ...&lt;BR /&gt;END DO ! iTime</description>
      <pubDate>Thu, 07 Jun 2012 19:21:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806528#M40459</guid>
      <dc:creator>virtualmemory</dc:creator>
      <dc:date>2012-06-07T19:21:20Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806529#M40460</link>
      <description>If nTries is very large, give the following a try&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL &lt;STRONG&gt;PRIVATE(iTime, iTry, j)&lt;/STRONG&gt;&lt;BR /&gt;DO iTime=1,nTimes&lt;BR /&gt;!$OMP MASTER&lt;BR /&gt; ...&lt;BR /&gt;!$OMP END MASTER&lt;BR /&gt; DO iTry=1,nTries&lt;BR /&gt;!$OMP MASTER&lt;BR /&gt; ...&lt;BR /&gt;!$OMP END MASTER&lt;BR /&gt;!$OMP DO&lt;BR /&gt;  DO iSp=1,nSp&lt;BR /&gt;   DO j=1,4000&lt;BR /&gt;  a(iSp,j)=...&lt;BR /&gt; END DO ! j&lt;BR /&gt;  END DO ! iSp&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;!$OMP MASTER&lt;BR /&gt; ...&lt;BR /&gt;!$OMP END MASTER&lt;BR /&gt; END DO ! iTry&lt;BR /&gt;!$OMP MASTER&lt;BR /&gt; ...&lt;BR /&gt;!$OMP END MASTER&lt;BR /&gt;END DO ! iTime&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;&lt;BR /&gt;The above will move the parallel region to outside your nTimes and nTriesloops.&lt;BR /&gt;*** each thread executes full range of nTimes and nTries...&lt;BR /&gt;*** however, only master thread executes the ...&lt;BR /&gt;*** and there is an implied barrier at !$OMP END DO&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Thu, 07 Jun 2012 21:32:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806529#M40460</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2012-06-07T21:32:23Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806530#M40461</link>
      <description>In your original code make j PRIVATE</description>
      <pubDate>Thu, 07 Jun 2012 21:33:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806530#M40461</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2012-06-07T21:33:41Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806531#M40462</link>
      <description>Jim -&lt;BR /&gt;&lt;BR /&gt;nTries is a user-set constant, generally 4 or less. Under normal circumstances, only a single try is necessary, so the loop exits after the first try.&lt;BR /&gt;&lt;BR /&gt;There are many private variables, including j, but I did not show these in the interest of simplicity.&lt;BR /&gt;&lt;BR /&gt;I did try something like your MASTER approach, but was unable to get it to compile due to the interleaving of Fortran and OMP blocks. I will have another look.&lt;BR /&gt;&lt;BR /&gt;If I were not using OpenMP, I would just start nSp worker threads (this is Windows) and have them be idle until the iSp loop starts.</description>
      <pubDate>Thu, 07 Jun 2012 22:28:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806531#M40462</guid>
      <dc:creator>virtualmemory</dc:creator>
      <dc:date>2012-06-07T22:28:00Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806532#M40463</link>
      <description>&amp;gt;&amp;gt;If I were not using OpenMP, I would just start nSp worker threads (this is Windows) and have them be idle until the iSp loop starts.&lt;BR /&gt;&lt;BR /&gt;In OpenMP, the first !$OMP PARALLEL region creates the thread pool. In an application this first time thing happens only once. For your timing insert&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL&lt;BR /&gt;write(*,*) omp_get_thread_num() ! or some code that does not optimize out&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;....&lt;BR /&gt;Now run your timed session&lt;BR /&gt;&lt;BR /&gt;Note, the initial thread startup is generally negligible...&lt;BR /&gt;unless you have some initialization going on...&lt;BR /&gt;like a large thread private area&lt;BR /&gt;and/or large stack that gets touched&lt;BR /&gt;&lt;BR /&gt;The above code will eliminate those variables from your test.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Fri, 08 Jun 2012 02:43:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806532#M40463</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2012-06-08T02:43:15Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806533#M40464</link>
      <description>It should be sufficient to set KMP_BLOCKTIME long enough that the threads persist between entries to parallel regions (default 200 ms). Both environment variable and subroutine call alternatives are available.</description>
      <pubDate>Fri, 08 Jun 2012 11:03:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806533#M40464</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-06-08T11:03:51Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806534#M40465</link>
      <description>I tried increasing KMP_BLOCKTIME to 10,000 msec (10 sec), with no change in execution time.&lt;BR /&gt;&lt;BR /&gt;I also tried decreasing the stack size from the default 2 MB to 1 MB, also with no effect.&lt;BR /&gt;&lt;BR /&gt;I need some tools to help me understand what's happening. Parallel execution is taking about 1.5 _longer_.&lt;BR /&gt;</description>
      <pubDate>Thu, 21 Jun 2012 20:38:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806534#M40465</guid>
      <dc:creator>virtualmemory</dc:creator>
      <dc:date>2012-06-21T20:38:52Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806535#M40466</link>
      <description>Intel VTune Amplifier XE is just what you need to analyze the thread performance</description>
      <pubDate>Thu, 21 Jun 2012 21:15:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806535#M40466</guid>
      <dc:creator>Steven_L_Intel1</dc:creator>
      <dc:date>2012-06-21T21:15:00Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806536#M40467</link>
      <description>Inspector should catch some threading errors, but, as Steve hinted, you may find some simply by Amplifier showing where all threads are contending for access to a variable.</description>
      <pubDate>Thu, 21 Jun 2012 21:30:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806536#M40467</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-06-21T21:30:04Z</dc:date>
    </item>
    <item>
      <title>How to minimize thread creation overhead in Intel Fortran/OpenM</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806537#M40468</link>
      <description>I have tried VTune, but can't get it to work for this pgm. See my post of 6/20, &lt;A title="here" href="http://software.intel.com/en-us/forums/showthread.php?t=106106"&gt;http://software.intel.com/en-us/forums/showthread.php?t=106106&lt;/A&gt;, for details.&lt;BR /&gt;</description>
      <pubDate>Thu, 21 Jun 2012 22:13:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/How-to-minimize-thread-creation-overhead-in-Intel-Fortran-OpenMP/m-p/806537#M40468</guid>
      <dc:creator>virtualmemory</dc:creator>
      <dc:date>2012-06-21T22:13:59Z</dc:date>
    </item>
  </channel>
</rss>

