<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Parallel Fortran Implementation in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767920#M83</link>
    <description>Hi,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I need some help parallelizing an implementation in fortran. I have a time-domain simulation program that we implemented in Fortran. The main structure is:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO t=0,Nt            !HAS TO BE IN SERIAL&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO I=1,N&lt;/DIV&gt;&lt;DIV&gt;CALL F(I)&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO I=1,N&lt;/DIV&gt;&lt;DIV&gt;CALL G(I)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The big DO has to be serial (it's the time incrementing). The inner DOs can be parallelized. N is usually big (8000-9000) and the F(), G(), ... functions are computationally intensive.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I tried parallelizing the internal DO-LOOPs with!$OMP PARALLEL DO directives. It runs ok (solved the data dependencies etc) BUT, it is a lot slower than running in serial!&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;-Is it because every time new threads are created and die and it indroduces overhead?&lt;/DIV&gt;&lt;DIV&gt;-Should I start with !$OMP PARALLEL in the beginning and use !$OMP DO-!$OMP END DO and protect the serial parts with !$OMP MASTER?&lt;/DIV&gt;&lt;DIV&gt;-If yes where should I put !$OMP PARALLEL? Before or after the big DO-LOOP (that needs to be serial)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Any suggestions?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks people...&lt;/DIV&gt;</description>
    <pubDate>Thu, 07 Apr 2011 11:39:39 GMT</pubDate>
    <dc:creator>Petros</dc:creator>
    <dc:date>2011-04-07T11:39:39Z</dc:date>
    <item>
      <title>Parallel Fortran Implementation</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767920#M83</link>
      <description>Hi,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I need some help parallelizing an implementation in fortran. I have a time-domain simulation program that we implemented in Fortran. The main structure is:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO t=0,Nt            !HAS TO BE IN SERIAL&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO I=1,N&lt;/DIV&gt;&lt;DIV&gt;CALL F(I)&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DO I=1,N&lt;/DIV&gt;&lt;DIV&gt;CALL G(I)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;SOME SERIAL CODE&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;.&lt;/DIV&gt;&lt;DIV&gt;END DO&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The big DO has to be serial (it's the time incrementing). The inner DOs can be parallelized. N is usually big (8000-9000) and the F(), G(), ... functions are computationally intensive.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I tried parallelizing the internal DO-LOOPs with!$OMP PARALLEL DO directives. It runs ok (solved the data dependencies etc) BUT, it is a lot slower than running in serial!&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;-Is it because every time new threads are created and die and it indroduces overhead?&lt;/DIV&gt;&lt;DIV&gt;-Should I start with !$OMP PARALLEL in the beginning and use !$OMP DO-!$OMP END DO and protect the serial parts with !$OMP MASTER?&lt;/DIV&gt;&lt;DIV&gt;-If yes where should I put !$OMP PARALLEL? Before or after the big DO-LOOP (that needs to be serial)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Any suggestions?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks people...&lt;/DIV&gt;</description>
      <pubDate>Thu, 07 Apr 2011 11:39:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767920#M83</guid>
      <dc:creator>Petros</dc:creator>
      <dc:date>2011-04-07T11:39:39Z</dc:date>
    </item>
    <item>
      <title>Parallel Fortran Implementation</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767921#M84</link>
      <description>How you can best parallelize this code depends on some factors that cannot be explained in a simple sketch of your code as you did in your opening thread message.&lt;BR /&gt;&lt;BR /&gt;1) How many sections of serial code do you have?&lt;BR /&gt;2) How many threads are available (i.e. what is the ratio of threads to serial sections)?&lt;BR /&gt;3) of the subroutines being called in your serial loops, which are dependent on which others?&lt;BR /&gt;4) of the subroutines that are dependend on (former) earlier serial loops, which are dependent on the same element and which are dependent on the entire earlier loop completing?&lt;BR /&gt;5) other issues.&lt;BR /&gt;&lt;BR /&gt;Parallization options that depend on the questions above&lt;BR /&gt;&lt;BR /&gt;! each subroutine elementaly wise independent of other elements&lt;BR /&gt;! but may be dependent on sequence A-Z&lt;BR /&gt;do t=0,nT&lt;BR /&gt; !$omp parallel do&lt;BR /&gt; do i=1,N&lt;BR /&gt; call A(i)&lt;BR /&gt; call B(i)&lt;BR /&gt; ...&lt;BR /&gt; call Z(i)&lt;BR /&gt; !$omp end parallel do&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;! each loop run in parallel but in sequence A, B, ... Z&lt;BR /&gt;do t=0,nT&lt;BR /&gt; !$omp parallel do&lt;BR /&gt; do i=1,N&lt;BR /&gt; call A(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp end parallel do&lt;BR /&gt; !$omp parallel do&lt;BR /&gt; do i=1,N&lt;BR /&gt; call B(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp end parallel do&lt;BR /&gt; ...&lt;BR /&gt; !$omp parallel do&lt;BR /&gt; do i=1,N&lt;BR /&gt; call Z(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp end parallel do&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;! each loop run by one thread&lt;BR /&gt;do t=0,nT&lt;BR /&gt; !$omp parallel sections private(i)&lt;BR /&gt; do i=1,N&lt;BR /&gt; call A(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp section&lt;BR /&gt; do i=1,N&lt;BR /&gt; call B(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp section&lt;BR /&gt; ...&lt;BR /&gt; !$omp end parallel sections&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;The first method established one team,and slices all loops. This reduces the number of team start/stops.&lt;BR /&gt;The second method (I assume is what you currently are doing) increased the number of team start/stops.&lt;BR /&gt;The third method established one team, one for each loop, the effectiveness will depend on the number of loops and amound of processsing for each loop.&lt;BR /&gt;&lt;BR /&gt;A forth method could be a variation of method 3 where you reorder and do more than one loop within a section.&lt;BR /&gt;&lt;BR /&gt;An additional optimization may be available if you note in your serial code the early-on subroutines run completely independent of the later-on subroutines. An example might be performing the physics computations up to the point where positional data is updated, and after positional infromation is updated, you call graphics routines to render the scene. In this situation you would code something like&lt;BR /&gt;&lt;BR /&gt;!$omp parallel private(t, i) ! create team outside t loop&lt;BR /&gt;do t=0,nT&lt;BR /&gt; !$omp sections ! note removal of parallel&lt;BR /&gt;do i=1,N&lt;BR /&gt; call A(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp section&lt;BR /&gt; ...&lt;BR /&gt; !$omp end sections&lt;BR /&gt;!$omp barrier&lt;BR /&gt; !$omp do&lt;BR /&gt; do i=1,N&lt;BR /&gt; call AdvancePosition(i)&lt;BR /&gt; end do&lt;BR /&gt; !$omp end do&lt;BR /&gt; !$omp master&lt;BR /&gt; call Render()&lt;BR /&gt; !$omp end master&lt;BR /&gt;end do&lt;BR /&gt;!$omp end parallel&lt;BR /&gt;&lt;BR /&gt;*** untested sketch above&lt;BR /&gt;&lt;BR /&gt;What the above provides is for the additional threads to begin the next physics calculations during the render process (assuming rendering is not parallel)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Hopefully this will give you some hints as to where you can go.&lt;BR /&gt;&lt;BR /&gt;Note, add this last step _after_ you get everything else working at top speed.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Thu, 07 Apr 2011 15:24:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767921#M84</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2011-04-07T15:24:55Z</dc:date>
    </item>
    <item>
      <title>Parallel Fortran Implementation</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767922#M85</link>
      <description>Thanks! I'll try the variations and try to fine tune them! I'm currently doing the 2nd one!</description>
      <pubDate>Fri, 08 Apr 2011 11:46:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Parallel-Fortran-Implementation/m-p/767922#M85</guid>
      <dc:creator>Petros</dc:creator>
      <dc:date>2011-04-08T11:46:01Z</dc:date>
    </item>
  </channel>
</rss>

