<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic openmp parallelization problem in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745911#M3920</link>
    <description>NAT = 3000&lt;BR /&gt;Number of cores = 4&lt;BR /&gt;KMAX = 10&lt;BR /&gt;MAXK eventually becomes 9980</description>
    <pubDate>Tue, 27 Apr 2010 22:41:29 GMT</pubDate>
    <dc:creator>ffgarcia</dc:creator>
    <dc:date>2010-04-27T22:41:29Z</dc:date>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745907#M3916</link>
      <description>&lt;P&gt;Dear users,&lt;/P&gt;&lt;P&gt;I recently got introduced to openMP and I would like to parallelize a serial f77 subroutine using openMP. All the examples that I've seen online focus on a single DO loop and that did not help much. I have multiple DO loops and several temporary variables in my code and I was wondering if experienced users can help me figure out what is wrong with my openmp implementation and help me define the commands (private, shared, reduction etc.) correctly. Please note that the variables FK, POS and the variables in the common block are used by other subroutines.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;SUBROUTINE FFULL(FCOMP, POS)&lt;/P&gt;&lt;P&gt;C ACCEPTS POS AS INPUT AND RETURNS FCOMP&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;IMPLICIT NONE&lt;/P&gt;&lt;P&gt;INTEGER MAXK, NMAX, NAT&lt;/P&gt;&lt;P&gt;INTEGER TOTK&lt;/P&gt;&lt;P&gt;INTEGER KMAX, KX, KY, KZ, I, KSQMAX, KSQ, J&lt;/P&gt;&lt;P&gt;PARAMETER ( MAXK = 50000, NMAX = 1000 )&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION KVEC(MAXK), POS(3,NMAX),Z(NMAX)&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION KAPPA, FCOMP(3, NMAX), INV_VOL&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION PI, TWOPI, S1, S2, L&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION RKX, RKY, RKZ, DP, ZI, ZJ&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION FX, FY, FZ, RX, RY, RZ&lt;/P&gt;&lt;P&gt;DOUBLE PRECISION SUM1(MAXK), SUM2(MAXK)&lt;/P&gt;&lt;P&gt;CHARACTER*2 SYMB(NMAX)&lt;/P&gt;&lt;P&gt;COMMON/KPARAM/KAPPA,KMAX,KSQMAX&lt;/P&gt;&lt;P&gt;COMMON/BLOCK1/L, Z, NAT, SYMB&lt;/P&gt;&lt;P&gt;COMMON/BLOCK2/KVEC, SUM1, SUM2&lt;/P&gt;&lt;P&gt;C *******************************************************************&lt;/P&gt;&lt;P&gt;C DEFINE SOME VARIABLES THAT WOULD BE NEEDED LATER&lt;/P&gt;&lt;P&gt;PI=2.D0*DASIN(1.D0)&lt;/P&gt;&lt;P&gt;TWOPI = 2.D0*PI&lt;/P&gt;&lt;P&gt;INV_VOL = 1.D0/(L**3)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;C INITIALIZE FCOMP TO ZERO&lt;/P&gt;&lt;P&gt;DO I = 1, NAT&lt;/P&gt;&lt;P&gt; DO J = 1, 3&lt;/P&gt;&lt;P&gt;FCOMP(J, I) = 0.D0&lt;/P&gt;&lt;P&gt; END DO&lt;/P&gt;&lt;P&gt;END DO&lt;/P&gt;&lt;P&gt;C BEGIN OPENMP HERE&lt;/P&gt;&lt;P&gt;c$omp parallel&lt;/P&gt;&lt;P&gt;c$omp&amp;amp; shared(FK,POS)&lt;/P&gt;&lt;P&gt;C$omp&amp;amp; private(I,RX,RY,RZ,ZI,KX,KY,KZ,RKX,RKY,RKZ,DP,S1,S2)&lt;/P&gt;&lt;P&gt;c$omp&amp;amp; reduction(+:TOTK,FX,FY,FZ)&lt;/P&gt;&lt;P&gt;C$omp do&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;DO I =1, NAT&lt;/P&gt;&lt;P&gt; FX = 0.D0&lt;/P&gt;&lt;P&gt; FY = 0.D0&lt;/P&gt;&lt;P&gt; FZ = 0.D0&lt;/P&gt;&lt;P&gt; RX = POS(1,I)&lt;/P&gt;&lt;P&gt; RY = POS(2,I)&lt;/P&gt;&lt;P&gt; RZ = POS(3,I)&lt;/P&gt;&lt;P&gt; ZI = Z(I)&lt;/P&gt;&lt;P&gt; TOTK = 0&lt;/P&gt;&lt;P&gt; DO 24 KX = -KMAX , KMAX&lt;/P&gt;&lt;P&gt; RKX = TWOPI*DFLOAT(KX)/L&lt;/P&gt;&lt;P&gt; DO 23 KY = -KMAX, KMAX&lt;/P&gt;&lt;P&gt; RKY = TWOPI*DFLOAT(KY)/L&lt;/P&gt;&lt;P&gt; DO 22 KZ = -KMAX,KMAX&lt;/P&gt;&lt;P&gt; RKZ = TWOPI*DFLOAT(KZ)/L&lt;/P&gt;&lt;P&gt; KSQ = KX*KX + KY*KY + KZ*KZ&lt;/P&gt;&lt;P&gt;IF((KSQ.GE.KSQMAX).OR.(KSQ.EQ.0))GOTO 22&lt;/P&gt;&lt;P&gt;TOTK = TOTK + 1&lt;/P&gt;&lt;P&gt;DP=RX*RKX+RY*RKY+RZ*RKZ&lt;/P&gt;&lt;P&gt; S1 = SUM1(TOTK)*DSIN(DP)&lt;/P&gt;&lt;P&gt; S2 = SUM2(TOTK)*DCOS(DP)&lt;/P&gt;&lt;P&gt; FX = FX + KVEC(TOTK)*RKX*(S1 - S2)&lt;/P&gt;&lt;P&gt; FY = FY + KVEC(TOTK)*RKY*(S1 - S2)&lt;/P&gt;&lt;P&gt; FZ = FZ + KVEC(TOTK)*RKZ*(S1 - S2)&lt;BR /&gt;22 CONTINUE&lt;/P&gt;&lt;P&gt;23CONTINUE&lt;/P&gt;&lt;P&gt;24CONTINUE&lt;/P&gt;&lt;P&gt; FCOMP(1,I) = FX*2.D0*INV_VOL*ZI&lt;/P&gt;&lt;P&gt; FCOMP(2,I) = FY*2.D0*INV_VOL*ZI&lt;/P&gt;&lt;P&gt; FCOMP(3,I) = FZ*2.D0*INV_VOL*ZI&lt;/P&gt;&lt;P&gt;END DO&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;C$omp end do&lt;/P&gt;&lt;P&gt;C$omp end parallel&lt;/P&gt;&lt;P&gt;RETURN&lt;/P&gt;</description>
      <pubDate>Mon, 26 Apr 2010 00:07:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745907#M3916</guid>
      <dc:creator>ffgarcia</dc:creator>
      <dc:date>2010-04-26T00:07:44Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745908#M3917</link>
      <description>If you would simply use&lt;BR /&gt;C$omp parallel do&lt;BR /&gt;then you wouldn't have to worry about which clauses belong to "parallel" and which to "do." I would have thought compiler diagnostics might give a clue, if the example were compilable.&lt;BR /&gt;Assuming you are looking for efficiency, a little work to optimize your inner loop should make a large difference.</description>
      <pubDate>Mon, 26 Apr 2010 06:34:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745908#M3917</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-04-26T06:34:52Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745909#M3918</link>
      <description>At a quick glance, you need to make KSQ private also.&lt;BR /&gt;Because you are threading the outermost loop, I see no need to put TOTK,FX,FY,FZ in aREDUCTION clause,the PRIVATE clausewould be sufficient. REDUCTION is only needed if the reduction is in the outermost loop, such that different parts of the reduction are carried out by different threads, which would need to be synchronized.&lt;BR /&gt;&lt;BR /&gt;A minor point: In some situations, it can be advantageous to initializate arraysin an OpenMP loop that is similar to the loop in which they will be used. But in your code as written, FCOMP does not need to be initialized at all.&lt;BR /&gt;&lt;BR /&gt;Intel sells a tool, thread checker, that can help to find race conditions in complex codes, such as may arise if you forget to declare a variable as private. But that's not reallyneeded for a simple loop without function calls such as this one.</description>
      <pubDate>Mon, 26 Apr 2010 18:31:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745909#M3918</guid>
      <dc:creator>Martyn_C_Intel</dc:creator>
      <dc:date>2010-04-26T18:31:58Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745910#M3919</link>
      <description>How large is NAT?&lt;BR /&gt;How many cores are available?&lt;BR /&gt;How large is KMAX (MAXK is a parameter, KMAX is in COMMON)&lt;BR /&gt;&lt;BR /&gt;If NAT is relatively small and KMAX relatively large, you mightconsider moving the paralleliztion tothe DO 24 loop&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 27 Apr 2010 12:16:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745910#M3919</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-04-27T12:16:50Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745911#M3920</link>
      <description>NAT = 3000&lt;BR /&gt;Number of cores = 4&lt;BR /&gt;KMAX = 10&lt;BR /&gt;MAXK eventually becomes 9980</description>
      <pubDate>Tue, 27 Apr 2010 22:41:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745911#M3920</guid>
      <dc:creator>ffgarcia</dc:creator>
      <dc:date>2010-04-27T22:41:29Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745912#M3921</link>
      <description>Ask the compiler to generate a vectorization report.&lt;BR /&gt;Examine the report to if the expressions are suitably vectorized. Perhapse they are not.&lt;BR /&gt;&lt;BR /&gt;Suggestion 1:&lt;BR /&gt;&lt;BR /&gt;To improve (the probability of) vectorization try changing your ?X, ?Y, ?Z variables into arrays&lt;BR /&gt;&lt;BR /&gt;DOUBLE PRECISION FXYZ(3), RXYZ(3)&lt;BR /&gt;&lt;BR /&gt;And the appropriate changes to thesource. e.g.&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;FXYZ= FXYZ *(KVEC(TOTK) * (SUM1(TOTK)*DSIN(DP) - SUM2(TOTK)*DCOS(DP)))&lt;/P&gt;&lt;BR /&gt;&lt;BR /&gt;Suggestion 2:&lt;BR /&gt;&lt;BR /&gt;Calculate the product terms ofDP=RX*RKX+RY*RKY+RZ*RKZ in the loop where the term varies. Then do the summation in the inner most loop.&lt;BR /&gt;&lt;BR /&gt;Suggestion 3:&lt;BR /&gt;&lt;BR /&gt;Add constant TWOPIOL = TWOPI / L&lt;BR /&gt;&lt;BR /&gt;When using SSE you will not carry temporary calculations to 80-bits. So precalculate,&lt;BR /&gt;&lt;BR /&gt;Suggestion 4:&lt;BR /&gt;&lt;BR /&gt;Replace the DFLOAT(KZ) in the inner loop with:&lt;BR /&gt;&lt;BR /&gt;FKZ = DFLOAT(-KMAX-1)&lt;BR /&gt;DO 22 KZ = -KMAX,KMAX&lt;BR /&gt; FKZ = FKZ + 1.0D&lt;BR /&gt;RKZ = FKZ * TWOPIOL ! was TWOPI*DFLOAT(KZ)/L&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Wed, 28 Apr 2010 12:32:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745912#M3921</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-04-28T12:32:00Z</dc:date>
    </item>
    <item>
      <title>openmp parallelization problem</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745913#M3922</link>
      <description>Vectorization may depend on changing the limits on KZ so as to eliminate the IF() conditional inside the loop.</description>
      <pubDate>Wed, 28 Apr 2010 13:41:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/openmp-parallelization-problem/m-p/745913#M3922</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-04-28T13:41:31Z</dc:date>
    </item>
  </channel>
</rss>

