<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Basic OpenMP question in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862585#M2522</link>
    <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Michael,&lt;BR /&gt;&lt;BR /&gt;add private(ix) clause&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;!$OMP PARALLEL PRIVATE(ix)&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;Ronie,&lt;BR /&gt;&lt;BR /&gt;Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.&lt;BR /&gt;Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.&lt;BR /&gt;&lt;BR /&gt;Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
    <pubDate>Wed, 21 Oct 2009 17:37:10 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2009-10-21T17:37:10Z</dc:date>
    <item>
      <title>Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862580#M2517</link>
      <description>I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:&lt;BR /&gt;&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP PARALELL DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALELL DO&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz&lt;NY. it="" seems="" like="" the="" loops="" do="" not="" unroll.="" among="" the="" compilation="" flags="" i="" have="" -unroll.="" what="" am="" i="" missing=""&gt;&lt;/NY.&gt;&lt;BR /&gt;Thank you for any hints and help. &lt;BR /&gt;</description>
      <pubDate>Thu, 24 Sep 2009 19:18:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862580#M2517</guid>
      <dc:creator>roine_vestman</dc:creator>
      <dc:date>2009-09-24T19:18:12Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862581#M2518</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/262391"&gt;roine.vestman@nyu.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:&lt;BR /&gt;&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP PARALELL DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALELL DO&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz&lt;NY. it="" seems="" like="" the="" loops="" do="" not="" unroll.="" among="" the="" compilation="" flags="" i="" have="" -unroll.="" what="" am="" i="" missing=""&gt;&lt;/NY.&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I think what you're missing is a collapse statement. Perhaps the following will give you what you want:&lt;BR /&gt;&lt;BR /&gt;
&lt;DIV&gt;do ix=1,nx&lt;BR /&gt;!$omp parallel do private(iy,iz) &lt;STRONG&gt;collapse(2)&lt;/STRONG&gt;&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$omp end parallel do&lt;BR /&gt;end do&lt;/DIV&gt;
&lt;BR /&gt;
&lt;P&gt;The description from the OpenMP 3.0 specification reads as follows:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN style="font-face: verdana;"&gt;The &lt;SPAN style="font-face: courier;"&gt;&lt;STRONG&gt;collapse&lt;/STRONG&gt;&lt;/SPAN&gt; clause may be used to specify how many loops are associated with the loop construct. ... If more than one loop is associated with the loop construct, then the iterations of all associated loops are collapsed into one larger iteration space which is then divided according to the &lt;SPAN style="font-face: courier;"&gt;&lt;STRONG&gt;schedule&lt;/STRONG&gt;&lt;/SPAN&gt; clause.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;BR /&gt;
&lt;P&gt;That should give you the span of tasks you are trying to achieve.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Sep 2009 00:06:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862581#M2518</guid>
      <dc:creator>robert-reed</dc:creator>
      <dc:date>2009-09-25T00:06:08Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862582#M2519</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Thank you for the suggestion. I tried adding collapse(2) but can't get it to work. The error reads:&lt;BR /&gt;&lt;BR /&gt;fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...&lt;BR /&gt; !$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &amp;amp;&lt;BR /&gt;&lt;BR /&gt;From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?&lt;BR /&gt;</description>
      <pubDate>Tue, 29 Sep 2009 05:46:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862582#M2519</guid>
      <dc:creator>roine_vestman</dc:creator>
      <dc:date>2009-09-29T05:46:15Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862583#M2520</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/262391"&gt;roine.vestman@nyu.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Thank you for the suggestion. I tried adding collapse(2) but can't get it to work. The error reads:&lt;BR /&gt;&lt;BR /&gt;fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...&lt;BR /&gt;!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &amp;amp;&lt;BR /&gt;&lt;BR /&gt;From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;AFAIK you need to update to 11.x to gain support for OpenMP with Intel's compilers.&lt;BR /&gt;&lt;BR /&gt;I would also suggest to rearrange your code the following way:&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL &lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;&lt;BR /&gt;This saves frequently creating and tearing down the parallel region, as it wouldbe done with your code pattern. Creating/tearing down the parallel region can have a serious impact to performance because of administrative and barrier overhead.&lt;BR /&gt;&lt;BR /&gt;Cheers,&lt;BR /&gt;-michael&lt;BR /&gt;</description>
      <pubDate>Tue, 29 Sep 2009 08:18:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862583#M2520</guid>
      <dc:creator>Michael_K_Intel2</dc:creator>
      <dc:date>2009-09-29T08:18:06Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862584#M2521</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/416135"&gt;Michael Klemm, Intel&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;AFAIK you need to update to 11.x to gain support for OpenMP with Intel's compilers.&lt;BR /&gt;&lt;BR /&gt;I would also suggest to rearrange your code the following way:&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL &lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;&lt;BR /&gt;This saves frequently creating and tearing down the parallel region, as it wouldbe done with your code pattern. Creating/tearing down the parallel region can have a serious impact to performance because of administrative and barrier overhead.&lt;BR /&gt;&lt;BR /&gt;Cheers,&lt;BR /&gt;-michael&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
I am learning openMP. This is interesting point.If I have nested loops, I can apply '!$omp parallel' and '!$omp end paraellel' to the outermost do loop, and then apply '!$OMP do' and '!$OMP end do' at the loop level for parallel region. I just need to worry the data environment for the paraell region. Is this correct?&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 16:58:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862584#M2521</guid>
      <dc:creator>maria</dc:creator>
      <dc:date>2009-10-21T16:58:43Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862585#M2522</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Michael,&lt;BR /&gt;&lt;BR /&gt;add private(ix) clause&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;!$OMP PARALLEL PRIVATE(ix)&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;Ronie,&lt;BR /&gt;&lt;BR /&gt;Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.&lt;BR /&gt;Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.&lt;BR /&gt;&lt;BR /&gt;Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 17:37:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862585#M2522</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2009-10-21T17:37:10Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862586#M2523</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/99850"&gt;jimdempseyatthecove&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Michael,&lt;BR /&gt;&lt;BR /&gt;add private(ix) clause&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;!$OMP PARALLEL PRIVATE(ix)&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;Ronie,&lt;BR /&gt;&lt;BR /&gt;Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.&lt;BR /&gt;Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.&lt;BR /&gt;&lt;BR /&gt;Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Thank you</description>
      <pubDate>Thu, 26 Nov 2009 14:25:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862586#M2523</guid>
      <dc:creator>mahmoudgalal1985</dc:creator>
      <dc:date>2009-11-26T14:25:16Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862587#M2524</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/99850"&gt;jimdempseyatthecove&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Michael,&lt;BR /&gt;&lt;BR /&gt;add private(ix) clause&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;!$OMP PARALLEL PRIVATE(ix)&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;Ronie,&lt;BR /&gt;&lt;BR /&gt;Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.&lt;BR /&gt;Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.&lt;BR /&gt;&lt;BR /&gt;Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Thanks</description>
      <pubDate>Thu, 26 Nov 2009 14:25:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Basic-OpenMP-question/m-p/862587#M2524</guid>
      <dc:creator>mahmoudgalal1985</dc:creator>
      <dc:date>2009-11-26T14:25:47Z</dc:date>
    </item>
  </channel>
</rss>

