<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OpenMP no speedup in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878998#M74701</link>
    <description>&lt;P&gt;How many times does the first loop construct execute? What is the average wall time per pass? There issubstantial overhead in the thread management and the convergence loop will incur that overhead on every pass.&lt;/P&gt;
&lt;P&gt;It's more work but you might try creating your thread pool outside the convergence loop.&lt;/P&gt;</description>
    <pubDate>Mon, 09 Jun 2008 15:17:09 GMT</pubDate>
    <dc:creator>Steve_Nuchia</dc:creator>
    <dc:date>2008-06-09T15:17:09Z</dc:date>
    <item>
      <title>OpenMP no speedup</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878996#M74699</link>
      <description>Hi,&lt;BR /&gt;I'm trying to parallelize the following cycle:&lt;BR /&gt;...&lt;BR /&gt;converged = .false.&lt;BR /&gt;r3=gam(2)*2.0&lt;BR /&gt;do while (converged .ne. .true.)&lt;BR /&gt; !===========================================&lt;BR /&gt; ! This is parallel block #1   &lt;BR /&gt; !$omp parallel private (t1,t2,s1,s2,t3,t4,t5,t6,q1,q2,r1,r2,r11,r22)&lt;BR /&gt; !$omp do &lt;BR /&gt; do i=0,nr-1&lt;BR /&gt; do j=0,nt-1&lt;BR /&gt; &lt;BR /&gt; t1=ru(j,i,1)+run(j,i,1)&lt;BR /&gt; t5=ru(j,i,1)*ru(j,i,1)+run(j,i,1)*run(j,i,1) &lt;BR /&gt; t2=iu(j,i,1)+iun(j,i,1)&lt;BR /&gt; t6=iu(j,i,1)*iu(j,i,1)+iun(j,i,1)*iun(j,i,1)&lt;BR /&gt; q1=-gam(2)*(t5-t6)&lt;BR /&gt; r1=t5+t6 &lt;BR /&gt; q2=r3*(iu(j,i,1)*ru(j,i,1)+iun(j,i,1)*run(j,i,1)) &lt;BR /&gt; s1=ru(j,i,2)+run(j,i,2)&lt;BR /&gt; r2=ru(j,i,2)*ru(j,i,2)+run(j,i,2)*run(j,i,2)&lt;BR /&gt; s2=iu(j,i,2)+iun(j,i,2)&lt;BR /&gt; r2=r2+iu(j,i,2)*iu(j,i,2)+iun(j,i,2)*iun(j,i,2)&lt;BR /&gt; t3=gam(1)*(t1*s2-t2*s1)&lt;BR /&gt; t4=-gam(1)*(t1*s1+t2*s2)&lt;BR /&gt; r11=(r1+beta1*r2)*alf(1)&lt;BR /&gt; r22=(beta1*r1+r2)*alf(2)&lt;BR /&gt;&lt;BR /&gt; fru(j,i,1)=t3+r11*t2&lt;BR /&gt; fiu(j,i,1)=t4-r11*t1&lt;BR /&gt; fru(j,i,2)=q2+r22*s2&lt;BR /&gt; fiu(j,i,2)=q1-r22*s1&lt;BR /&gt;&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt; !$omp end do nowait&lt;BR /&gt; !$omp end parallel&lt;BR /&gt; ! End of parallel block #1&lt;BR /&gt; !=====
======================================&lt;BR /&gt; ! &lt;PARALLEL block=""&gt;&lt;BR /&gt; ....&lt;BR /&gt; ! &lt;CHECK convergence="" block=""&gt;&lt;BR /&gt; ....&lt;BR /&gt;end do&lt;BR /&gt;...&lt;BR /&gt;Arrays are declared as follows:&lt;BR /&gt;double precision, allocatable, dimension(:,:,:):: ru,run,fru&lt;BR /&gt;double precision, allocatable, dimension(:,:,:):: iu,iun,fiun&lt;BR /&gt;allocate(ru(0:nt,0:nr,2),run(0:nt,0:nr,2),fru(0:nt,0:nr,2))&lt;BR /&gt;allocate(iu(0:nt,0:nr,2),iun(0:nt,0:nr,2),fiu(0:nt,0:nr,2))&lt;BR /&gt;&lt;BR /&gt;nt=2048, nr=250&lt;BR /&gt; &lt;BR /&gt;To
estimate the speedup I've created to activities of Thread Profiler with
number of threads equal to 1 and 2. The results of runs of these
activities show that I have absolutely no speedup for the parallel block #1: 21sec (in case of 1
thread) and 20.9 sec (in case of 2 threads), while for the parallel block #2 speedup rate is more than 1.6. Am I doing smth wrong in the first parallel block? &lt;BR /&gt;&lt;BR /&gt;Thanks in advance&lt;BR /&gt;&lt;/CHECK&gt;&lt;/PARALLEL&gt;</description>
      <pubDate>Mon, 02 Jun 2008 20:31:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878996#M74699</guid>
      <dc:creator>misty12</dc:creator>
      <dc:date>2008-06-02T20:31:34Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP no speedup</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878997#M74700</link>
      <description>&lt;P&gt;Try&lt;/P&gt;&lt;PRE&gt;!$omp parallel do private (t1,t2,s1,s2,t3,t4,t5,t6,q1,q2,r1,r2,r11,r22) schedule(static,1)&lt;BR /&gt; do i=0,nr-1&lt;BR /&gt; do j=0,nt-1&lt;BR /&gt; ...&lt;BR /&gt; end do&lt;BR /&gt; end do&lt;BR /&gt;!$omp end parallel do&lt;BR /&gt;&lt;/PRE&gt;&lt;PRE&gt;Jim Dempsey&lt;/PRE&gt;&lt;PRE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 04 Jun 2008 12:04:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878997#M74700</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2008-06-04T12:04:36Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP no speedup</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878998#M74701</link>
      <description>&lt;P&gt;How many times does the first loop construct execute? What is the average wall time per pass? There issubstantial overhead in the thread management and the convergence loop will incur that overhead on every pass.&lt;/P&gt;
&lt;P&gt;It's more work but you might try creating your thread pool outside the convergence loop.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jun 2008 15:17:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878998#M74701</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-06-09T15:17:09Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP no speedup</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878999#M74702</link>
      <description>Another point: calculate the total memory bandwidth of the calculation and compare it to the memory bandwidth of your system. If it is saturating the memory controller and/or the cache &amp;lt;-&amp;gt; register data paths with one thread it will run in pretty much the same time with more threads.</description>
      <pubDate>Tue, 10 Jun 2008 14:31:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/878999#M74702</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-06-10T14:31:13Z</dc:date>
    </item>
    <item>
      <title>Re: OpenMP no speedup</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/879000#M74703</link>
      <description>&lt;P&gt;make that shared cache &amp;lt;-&amp;gt; private cache data paths.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jun 2008 14:32:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/OpenMP-no-speedup/m-p/879000#M74703</guid>
      <dc:creator>Steve_Nuchia</dc:creator>
      <dc:date>2008-06-10T14:32:39Z</dc:date>
    </item>
  </channel>
</rss>

