<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Basic OpenMP question in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759322#M14809</link>
    <description>Hello, &lt;BR /&gt;&lt;BR /&gt;While also to me it is not entirely clear what your question is, some observations:&lt;BR /&gt;&lt;BR /&gt;(1) Did you also mis-spell the OpenMP directive in your code? It should be !$OMP PARALLEL DO ... If yes, the compiler may have done nothing to actually parallelize your code&lt;EM&gt;.&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;(2) If by load you actually mean the system load - well, that would not be determined by your loop parameters, but by the amount of resources you provide to the executable. So, by setting&lt;BR /&gt;export OMP_NUM_THREADS=3&lt;BR /&gt;before running, you could expect to reach a system load of 3 provided all threads compute in parallel at least most of the time. It is rarely useful to set the variable to a number larger than the number of cores available in the system. &lt;BR /&gt;&lt;BR /&gt;(3) By default, the !$OMP PARALLEL DO will only parallelize the directly enclosed loop, in your case dividing up the iterations of index IY among the available threads. If NY is very small, you might indeed run into performance problems. One of these may simply be thread startup times, so using a structure like&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;&lt;BR /&gt;may be helpful. If you actually wish to workshare both the IY and IZ loop levels, you could add a COLLAPSE(2) clause to the OMP DO directive; however this is OpenMP 3.0 so you'll need a recent compiler release for this to be accepted. &lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;</description>
    <pubDate>Fri, 25 Sep 2009 14:46:34 GMT</pubDate>
    <dc:creator>reinhold-bader</dc:creator>
    <dc:date>2009-09-25T14:46:34Z</dc:date>
    <item>
      <title>Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759320#M14807</link>
      <description>I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:&lt;BR /&gt;&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP PARALELL DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALELL DO&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;Roine</description>
      <pubDate>Thu, 24 Sep 2009 19:00:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759320#M14807</guid>
      <dc:creator>roine_vestman</dc:creator>
      <dc:date>2009-09-24T19:00:51Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759321#M14808</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/262391"&gt;roine.vestman@nyu.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:&lt;BR /&gt;&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP PARALELL DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALELL DO&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;Roine&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I do not know what you are asking. From your code, I would expect the iterations of iy to be split amongst the available cores. Is this a performance problem that you see, are you expecting a better speedup? What are you seeing, and what does your code look like?&lt;BR /&gt;&lt;BR /&gt;ron</description>
      <pubDate>Thu, 24 Sep 2009 20:54:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759321#M14808</guid>
      <dc:creator>Ron_Green</dc:creator>
      <dc:date>2009-09-24T20:54:04Z</dc:date>
    </item>
    <item>
      <title>Re: Basic OpenMP question</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759322#M14809</link>
      <description>Hello, &lt;BR /&gt;&lt;BR /&gt;While also to me it is not entirely clear what your question is, some observations:&lt;BR /&gt;&lt;BR /&gt;(1) Did you also mis-spell the OpenMP directive in your code? It should be !$OMP PARALLEL DO ... If yes, the compiler may have done nothing to actually parallelize your code&lt;EM&gt;.&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;(2) If by load you actually mean the system load - well, that would not be determined by your loop parameters, but by the amount of resources you provide to the executable. So, by setting&lt;BR /&gt;export OMP_NUM_THREADS=3&lt;BR /&gt;before running, you could expect to reach a system load of 3 provided all threads compute in parallel at least most of the time. It is rarely useful to set the variable to a number larger than the number of cores available in the system. &lt;BR /&gt;&lt;BR /&gt;(3) By default, the !$OMP PARALLEL DO will only parallelize the directly enclosed loop, in your case dividing up the iterations of index IY among the available threads. If NY is very small, you might indeed run into performance problems. One of these may simply be thread startup times, so using a structure like&lt;BR /&gt;&lt;BR /&gt;!$OMP PARALLEL&lt;BR /&gt;do ix=1,nx&lt;BR /&gt;!$OMP DO ....&lt;BR /&gt;do iy=1,ny&lt;BR /&gt;do iz=1,nz&lt;BR /&gt;...&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END DO&lt;BR /&gt;end do&lt;BR /&gt;!$OMP END PARALLEL&lt;BR /&gt;&lt;BR /&gt;may be helpful. If you actually wish to workshare both the IY and IZ loop levels, you could add a COLLAPSE(2) clause to the OMP DO directive; however this is OpenMP 3.0 so you'll need a recent compiler release for this to be accepted. &lt;BR /&gt;&lt;BR /&gt;Regards&lt;BR /&gt;</description>
      <pubDate>Fri, 25 Sep 2009 14:46:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/Basic-OpenMP-question/m-p/759322#M14809</guid>
      <dc:creator>reinhold-bader</dc:creator>
      <dc:date>2009-09-25T14:46:34Z</dc:date>
    </item>
  </channel>
</rss>

