<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hybrid MPI/OpenMP : program seems to stall in non blocking comm in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777314#M213</link>
    <description>I try something : &lt;BR /&gt;I replace all the calls to non blocking MPI communications by calls to MPI_SendRecv like&lt;BR /&gt;&lt;BR /&gt;!$OMP SECTIONS&lt;BR /&gt;!&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; isendtag = 1&lt;BR /&gt; irecvtag = 1&lt;BR /&gt; CALL MPI_SendRecv (data, type, ...)&lt;BR /&gt;!&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; isendtag = 2&lt;BR /&gt; irecvtag = 2&lt;BR /&gt; CALL MPI_SendRecv (data, type, ...)&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;!$OMP END SECTIONS&lt;BR /&gt;&lt;BR /&gt;and it works : the application is running and ends correctly after the right number of time iterations.&lt;BR /&gt;All results are not correct yet but the code does not hang any more.&lt;BR /&gt;&lt;BR /&gt;Are there some special settings one has to think about when using non blocking communications inside OPenMP parallel region ?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 13 Dec 2011 16:07:08 GMT</pubDate>
    <dc:creator>mguy44</dc:creator>
    <dc:date>2011-12-13T16:07:08Z</dc:date>
    <item>
      <title>Hybrid MPI/OpenMP : program seems to stall in non blocking communications</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777313#M212</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;I have a MPI Fortran90 CFD application parallelized in X-Y (Cartesian 2D topology) that works well and I decide to parallelize it in Z using OpenMP.&lt;BR /&gt;With the MPI 2D topology, each subdomain may have up to 8 neighbours, there's no periodicity. That is :&lt;BR /&gt;NW NN NE&lt;BR /&gt;WW ME EE&lt;BR /&gt;SW SS SE&lt;BR /&gt;with the convention NW is North West, SE is South East and so on.&lt;BR /&gt;ME is equal to my_MPI_Rank2d, the MPI rank of the current process.&lt;BR /&gt;my_OMP_Thd contains the OpenMP rank of each thread in the thread team of each MPI process.&lt;BR /&gt;&lt;BR /&gt;A call to MPI_Init_Thread gives me back MPI_THREAD_MULTIPLE level for thread support in MPI.&lt;BR /&gt;&lt;BR /&gt;MPI communications are non blocking ones (MPI_ISend, MPI_IRecv) and are all put in a SECTIONS ... END SECTIONS construct, but only one per SECTION. So for each MPI process, the communications with the 8 potential neighbours are distributed among the team of threads. A call to MPI_WaitAll is done after them by the MASTER thread. Each thread keeps its informations about the requests it has in a private storage. That is&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;computing stuff&lt;BR /&gt;&lt;BR /&gt;!$OMP BARRIER&lt;BR /&gt;&lt;BR /&gt; nb_requests_local = 0&lt;BR /&gt;!$OMP SECTIONS&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'before IRecv WW'&lt;BR /&gt; call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt; nb_requests_local = nb_requests_local+1&lt;BR /&gt; CALL MPI_IRecv ( data, type, array_requests_local(nb_requests_local) )&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'after IRecv WW'&lt;BR /&gt; call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt;&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'before ISend EE'&lt;BR /&gt; call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt; nb_requests_local = nb_requests_local+1&lt;BR /&gt; CALL MPI_ISend (data, type, array_requests_local(nb_requests_local))&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'after ISend EE'&lt;BR /&gt; call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt;&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt;...&lt;BR /&gt;!$OMP END SECTIONS NOWAIT&lt;BR /&gt;&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'after SECTIONS NOWAIT'&lt;BR /&gt;
 call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt;&lt;BR /&gt;!$CRITICAL&lt;BR /&gt;update and filling of a shared array with the different requests hold by each thread in private storage&lt;BR /&gt;!$OMP END CRITICAL&lt;BR /&gt;&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'after CRITICAL'&lt;BR /&gt;

 call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt;&lt;BR /&gt;!$OMP BARRIER&lt;BR /&gt;&lt;BR /&gt;!$OMP MASTER&lt;BR /&gt; CALL MPI_WaitAll ()&lt;BR /&gt;!$OMP END MASTER&lt;BR /&gt;&lt;BR /&gt; write (400+my_MPI_Rank2d*10+my_OMP_Thd,*) 'after WaitAll'&lt;BR /&gt;

 call flush (400+my_MPI_Rank2d*10+my_OMP_Thd)&lt;BR /&gt;&lt;BR /&gt;!$OMP BARRIER&lt;BR /&gt;&lt;BR /&gt;The write / flush calls are put here for a debug purpose and of course will be removed after debugging. But here, they help me to show what is wrong.&lt;BR /&gt;I run this code on a SGI Altix machine, using 2 nodes, having each 2 processors with 6 cores.&lt;BR /&gt;I run this code using 12 MPI processes, 6 on each node. Each MPI process creates a team of 2 threads.&lt;BR /&gt;&lt;BR /&gt;What is strange is that OpenMP threads seem to be blocked in non blocking MPI calls, in the fort.4xx files, I get outputs like :&lt;BR /&gt;==&amp;gt; fort.400 &amp;lt;==&lt;BR /&gt;before IRecv WW&lt;BR /&gt;After IRecv WW&lt;BR /&gt;Before ISend EE&lt;BR /&gt;After ISend EE&lt;BR /&gt;Before IRecv EE &amp;lt;&amp;lt;&amp;lt;&amp;lt; end of this file&lt;BR /&gt;&lt;BR /&gt;==&amp;gt; fort.401 &amp;lt;==&lt;BR /&gt;Before IRecv SW&lt;BR /&gt;After IRecv SW&lt;BR /&gt;Before ISend NE &amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt;&amp;lt; end of this file&lt;BR /&gt;&lt;BR /&gt;....&lt;BR /&gt;&lt;BR /&gt;And all the 24 threads behave like this, they enter the communication routine, do some MPI calls (with real neighbours, not only MPI_PROC_NULL ones) ; it may not be the same number for each thread. None reaches the writing of the message after the END SECTIONS directive.&lt;BR /&gt;&lt;BR /&gt;The data exchanged between the MPI processes are ghost cells of a 4D array (5,Nx,Ny,Nz), so faces or 'corner's columns' with a depth of at least 3 layers. Send buffers may overlap but not receive ones. Typically, Nx=112, Ny=204, Nz=32&lt;BR /&gt;&lt;BR /&gt;I use ifort (IFORT) 12.1.0 20111011 and intel-mpi 4.0.0.028&lt;BR /&gt;&lt;BR /&gt;1. I check the topology.&lt;BR /&gt;2. I check the data scope attribute of the different variables.&lt;BR /&gt;3. I try replacing the SECTIONS contruct by a set of SINGLE / END SINGLE NOWAIT ones, but it behaves badly too.&lt;BR /&gt;4. I use ITAC and the -mpi_check option but I get nothing interesting&lt;BR /&gt;5. I run the code whith 12 cores and only 1 thread per MPI process : It works like the pure MPI code.&lt;BR /&gt;&lt;BR /&gt;But I don't understand why it freezes.&lt;BR /&gt;&lt;BR /&gt;Any help will be appreciated.&lt;BR /&gt;&lt;BR /&gt;If you need further informations, please let me know.&lt;BR /&gt;&lt;BR /&gt;Regards</description>
      <pubDate>Mon, 12 Dec 2011 16:05:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777313#M212</guid>
      <dc:creator>mguy44</dc:creator>
      <dc:date>2011-12-12T16:05:33Z</dc:date>
    </item>
    <item>
      <title>Hybrid MPI/OpenMP : program seems to stall in non blocking comm</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777314#M213</link>
      <description>I try something : &lt;BR /&gt;I replace all the calls to non blocking MPI communications by calls to MPI_SendRecv like&lt;BR /&gt;&lt;BR /&gt;!$OMP SECTIONS&lt;BR /&gt;!&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; isendtag = 1&lt;BR /&gt; irecvtag = 1&lt;BR /&gt; CALL MPI_SendRecv (data, type, ...)&lt;BR /&gt;!&lt;BR /&gt;!$OMP SECTION&lt;BR /&gt; isendtag = 2&lt;BR /&gt; irecvtag = 2&lt;BR /&gt; CALL MPI_SendRecv (data, type, ...)&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;!$OMP END SECTIONS&lt;BR /&gt;&lt;BR /&gt;and it works : the application is running and ends correctly after the right number of time iterations.&lt;BR /&gt;All results are not correct yet but the code does not hang any more.&lt;BR /&gt;&lt;BR /&gt;Are there some special settings one has to think about when using non blocking communications inside OPenMP parallel region ?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 13 Dec 2011 16:07:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777314#M213</guid>
      <dc:creator>mguy44</dc:creator>
      <dc:date>2011-12-13T16:07:08Z</dc:date>
    </item>
    <item>
      <title>Hybrid MPI/OpenMP : program seems to stall in non blocking comm</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777315#M214</link>
      <description>This may be more likely to get a reply on the HPC/clustering forum where experts in Intel MPI participate.</description>
      <pubDate>Tue, 13 Dec 2011 17:13:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hybrid-MPI-OpenMP-program-seems-to-stall-in-non-blocking/m-p/777315#M214</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-12-13T17:13:58Z</dc:date>
    </item>
  </channel>
</rss>

