<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sylvain, in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Problem-runing-Intel-MPI-w-IB/m-p/1026780#M4115</link>
    <description>&lt;P&gt;Sylvain,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;Some questions:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; 1) Is this a hybrid parallel programming application that uses both MPI and say OpenMP?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; 2) Can you run the MPI application outside of the scheduler?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;3) Have you tried running the application with "mpiexec -check_mpi -n ..."?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;4) &amp;nbsp;How many MPI ranks are you using?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;5) Is this symptom reproducible with 1 MPI rank?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;6) In lieu of the environment variable setting "I_MPI_FABRICS="shm:ofa", could you please try using "-genv I_MPI_FABRICS shm:tcp"?&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jul 2015 23:49:21 GMT</pubDate>
    <dc:creator>Steve_H_Intel1</dc:creator>
    <dc:date>2015-07-07T23:49:21Z</dc:date>
    <item>
      <title>Problem runing Intel MPI w/ IB</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Problem-runing-Intel-MPI-w-IB/m-p/1026779#M4114</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The same code, submitted to the queue (SGE) on our cluster, crashes right away some of the time (25% of the cases?) on the following error message:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;libmpifort.so.12   00002AC615FAD9BC  Unknown               Unknown  Unknown
magic.exe          00000000004D4B01  step_time_mod_mp_         335  m_step_time.F90
magic.exe          00000000004FA8EA  MAIN__                    301  magic.F90
magic.exe          00000000004042CE  Unknown               Unknown  Unknown
libc.so.6          0000003C23C1D994  Unknown               Unknown  Unknown
magic.exe          00000000004041E9  Unknown               Unknown  Unknown
[mpiexec@compute-8-21.local] control_cb (../../pm/pmiserv/pmiserv_cb.c:764): assert (!closed) failed
[mpiexec@compute-8-21.local] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@compute-8-21.local] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:480): error waiting for event
[mpiexec@compute-8-21.local] main (../../ui/mpich/mpiexec.c:945): process manager error waiting for completion
&lt;/PRE&gt;

&lt;P&gt;Line 335 is a 'call mpi_barrier()' hence the libmpifort.so.12 I presume.&lt;/P&gt;

&lt;P&gt;Since we use the Infiniband (I_MPI_FABRICS="shm:ofa") I checked that the IB is working with the exact same host list, using a trivial ring passing test program (in C and in F90). The ring passing programs completes fine, every time. Any clue how to investigate this?&lt;/P&gt;

&lt;P&gt;The 'magic.exe' program (3rd party, scientific large simulation code) produces the following warning(s) although ti contineu running, when it starts ok - this could be unrelated.&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;[95] ERROR - handle_read_individual(): Get one packet, but need to be packetized  10628, 1 4604, 12280
[95] ERROR - handle_read_individual(): Get one packet, but need to be packetized  10628, 1 4604, 12280
&lt;/PRE&gt;

&lt;P&gt;Any help appreciated.&lt;/P&gt;

&lt;P&gt;Sylvain,&lt;/P&gt;

&lt;P&gt;BTW:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;% ldd magic.exe
        linux-vdso.so.1 =&amp;gt;  (0x00007ffff37ef000)
        libmpifort.so.12 =&amp;gt; /software/intel_2015/impi/5.0.1.035/intel64/lib/libmpifort.so.12 (0x00002ab829b40000)
        libmpi.so.12 =&amp;gt; /software/intel_2015/impi/5.0.1.035/intel64/lib/debug/libmpi.so.12 (0x00002ab829dcd000)
        libdl.so.2 =&amp;gt; /lib64/libdl.so.2 (0x0000003e4de00000)
        librt.so.1 =&amp;gt; /lib64/librt.so.1 (0x0000003e4ea00000)
        libpthread.so.0 =&amp;gt; /lib64/libpthread.so.0 (0x0000003e4e200000)
        libm.so.6 =&amp;gt; /lib64/libm.so.6 (0x0000003e4da00000)
        libc.so.6 =&amp;gt; /lib64/libc.so.6 (0x0000003e4d600000)
        libgcc_s.so.1 =&amp;gt; /lib64/libgcc_s.so.1 (0x0000003e5c800000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003e4d200000)
and mpirun is aliased to /software/intel_2015/impi/5.0.1.035/bin64/mpirun, 
so it should not be a problem of mixing MPI implementations (we do support 
Intel, PGI and GNU).&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jun 2015 15:08:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Problem-runing-Intel-MPI-w-IB/m-p/1026779#M4114</guid>
      <dc:creator>Sylvain_Korzennik</dc:creator>
      <dc:date>2015-06-04T15:08:51Z</dc:date>
    </item>
    <item>
      <title>Sylvain,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Problem-runing-Intel-MPI-w-IB/m-p/1026780#M4115</link>
      <description>&lt;P&gt;Sylvain,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;Some questions:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; 1) Is this a hybrid parallel programming application that uses both MPI and say OpenMP?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; 2) Can you run the MPI application outside of the scheduler?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;3) Have you tried running the application with "mpiexec -check_mpi -n ..."?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;4) &amp;nbsp;How many MPI ranks are you using?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;5) Is this symptom reproducible with 1 MPI rank?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;6) In lieu of the environment variable setting "I_MPI_FABRICS="shm:ofa", could you please try using "-genv I_MPI_FABRICS shm:tcp"?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jul 2015 23:49:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Problem-runing-Intel-MPI-w-IB/m-p/1026780#M4115</guid>
      <dc:creator>Steve_H_Intel1</dc:creator>
      <dc:date>2015-07-07T23:49:21Z</dc:date>
    </item>
  </channel>
</rss>

