<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thank you very much, In fact in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135566#M7725</link>
    <description>&lt;P&gt;Thank you very much, In fact the environment variables was not defined properly. Thanks for your recommendation.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Jun 2017 15:53:27 GMT</pubDate>
    <dc:creator>Julio</dc:creator>
    <dc:date>2017-06-28T15:53:27Z</dc:date>
    <item>
      <title>Hybrid approach (MPI + OpenMP) time issue</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135564#M7723</link>
      <description>&lt;P&gt;Dear Community;&lt;/P&gt;

&lt;P&gt;I have a code that I have tested and it is working perfectly with MPI. Since I am using only 1D Decomposition ( I am decomposing the domain into strips) I want to use OpenMP in the other direction. The reason is because I do not want to endup working with cubes because for IOW is a headache. In some instances each rank has a unique size in the direction I used MPI to send and receive messages.&lt;/P&gt;

&lt;P&gt;Therefore, I implemented:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!$OMP PARALLEL PRIVATE (i,j)
!$OMP DO 
 .....
!$OMP END DO
!$OMP END PARALLEL&lt;/PRE&gt;

&lt;P&gt;Since I am submitting my job in a cluster I am setting my variable as "export OMP_NUM_THREAD=N" Where N is the number of threads.&lt;/P&gt;

&lt;P&gt;The OpenMP version was also tested and it worked perfectly, and it speed things up as I wanted when I use it alone. However, in this case I found a very weird results. This particular case my arrays is 4001x4001. If I spread out my problem with 20 processes I will have close to 200 nodes in the direction of MPI and 4001 in the OpenMP direction. In other words, I will have 200 nodes in the horizontal direction on my computation and 4001 in the vertical direction.&lt;/P&gt;

&lt;P&gt;It turns out that the version with only MPI takes 4.37 seconds to run and the version with 2 threads takes 469.83 seconds. Both runs are with the same number of MPI processes (20). If I set my OMP_NUM_THREAD =1 the time is still high. In the latter case I expected it to be the same or close to the pure MPI run.&lt;/P&gt;

&lt;P&gt;Both runs have the same optimization flags and so forth.&lt;/P&gt;

&lt;P&gt;I will greatly appreciate your ideas and suggestions&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Julio&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 21:05:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135564#M7723</guid>
      <dc:creator>Julio</dc:creator>
      <dc:date>2017-06-26T21:05:50Z</dc:date>
    </item>
    <item>
      <title>For the moment, verify that</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135565#M7724</link>
      <description>&lt;P&gt;For the moment, verify that environment variables&amp;nbsp;KMP_AFFINITY and any other affinity specifying environment variables are .NOT. set on each and every node. You can experiment with these later.&lt;/P&gt;

&lt;P&gt;Also verify that the number of threads spawned for OpenMP is indeed what you believe is specified.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2017 11:09:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135565#M7724</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-06-28T11:09:30Z</dc:date>
    </item>
    <item>
      <title>Thank you very much, In fact</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135566#M7725</link>
      <description>&lt;P&gt;Thank you very much, In fact the environment variables was not defined properly. Thanks for your recommendation.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jun 2017 15:53:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135566#M7725</guid>
      <dc:creator>Julio</dc:creator>
      <dc:date>2017-06-28T15:53:27Z</dc:date>
    </item>
    <item>
      <title>I have gotten into the habit</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135567#M7726</link>
      <description>&lt;P&gt;I have gotten into the habit of setting the "verbose" option to KMP_AFFINITY.&amp;nbsp; It produces a lot of output that I don't usually need, but the absence of that output is an effective reminder that I have forgotten to define KMP_AFFINITY.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jun 2017 12:49:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-approach-MPI-OpenMP-time-issue/m-p/1135567#M7726</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-06-29T12:49:58Z</dc:date>
    </item>
  </channel>
</rss>

