<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic No worries, good to hear that in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123391#M5514</link>
    <description>&lt;P&gt;No worries, good to hear that everything is working now.&amp;nbsp; Let us know if it shows up again.&lt;/P&gt;</description>
    <pubDate>Thu, 12 May 2016 14:29:10 GMT</pubDate>
    <dc:creator>James_T_Intel</dc:creator>
    <dc:date>2016-05-12T14:29:10Z</dc:date>
    <item>
      <title>MPI error [../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:2482]</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123384#M5507</link>
      <description>&lt;P&gt;I've a coarray program which I compile for distributed memory execution.&lt;/P&gt;&lt;P&gt;I then run it on a single 16-core node with different numbers of processors.&lt;/P&gt;&lt;P&gt;It runs fine with 2, 4&amp;nbsp; and 8 processes, but give the following error with 16 processes.&lt;/P&gt;&lt;P&gt;Can I get any clue from the error message?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Anton&lt;/P&gt;&lt;P&gt;===&amp;gt; co_back1.x&lt;BR /&gt;-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 2 ./co_back1.x&lt;BR /&gt;188.85user 29.04system 1:55.30elapsed 188%CPU (0avgtext+0avgdata 66640maxresident)k&lt;BR /&gt;1624inputs+945432outputs (2major+13770minor)pagefaults 0swaps&lt;BR /&gt;===&amp;gt; co_back1.x&lt;BR /&gt;-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 4 ./co_back1.x&lt;BR /&gt;263.14user 94.69system 2:51.91elapsed 208%CPU (0avgtext+0avgdata 71376maxresident)k&lt;BR /&gt;0inputs+2791464outputs (0major+22881minor)pagefaults 0swaps&lt;BR /&gt;===&amp;gt; co_back1.x&lt;BR /&gt;-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 8 ./co_back1.x&lt;BR /&gt;420.93user 292.96system 2:41.95elapsed 440%CPU (0avgtext+0avgdata 88192maxresident)k&lt;BR /&gt;0inputs+8998288outputs (0major+48387minor)pagefaults 0swaps&lt;BR /&gt;===&amp;gt; co_back1.x&lt;BR /&gt;-genvall -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 16 ./co_back1.x&lt;BR /&gt;application called MPI_Abort(comm=0x84000000, 3) - process 0&lt;BR /&gt;[1:node43-038] unexpected disconnect completion event from [0:node43-038]&lt;BR /&gt;[1:node43-038][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:2482] Intel MPI fatal error&lt;BR /&gt;: OpenIB-cma DTO operation posted for [0:node43-038] completed with error. status=0x1. cookie=0x150008000&lt;BR /&gt;0&lt;BR /&gt;Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c at line 2485: 0&lt;BR /&gt;internal ABORT - process 1&lt;/P&gt;</description>
      <pubDate>Wed, 11 May 2016 12:37:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123384#M5507</guid>
      <dc:creator>AShte</dc:creator>
      <dc:date>2016-05-11T12:37:59Z</dc:date>
    </item>
    <item>
      <title>This looks like a</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123385#M5508</link>
      <description>&lt;P&gt;This&amp;nbsp;looks like a communication fabric error.&amp;nbsp; Try running the following command:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;mpirun -genvall -genv I_MPI_FABRICS shm:dapl -genv I_MPI_HYDRA_DEBUG 1 -n 16 -machinefile ./nodes IMB-MPI1&lt;/PRE&gt;

&lt;P&gt;James.&lt;/P&gt;</description>
      <pubDate>Wed, 11 May 2016 19:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123385#M5508</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2016-05-11T19:14:00Z</dc:date>
    </item>
    <item>
      <title>I get these errors:</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123386#M5509</link>
      <description>&lt;P&gt;I get these errors:&lt;/P&gt;

&lt;P&gt;[proxy:0:0@node43-038] HYDU_create_process (../../utils/launch/launch.c:622): execvp error on file I_MPI_HYDRA_DEBUG (No such file or directory)&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Anton&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 07:49:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123386#M5509</guid>
      <dc:creator>AShte</dc:creator>
      <dc:date>2016-05-12T07:49:23Z</dc:date>
    </item>
    <item>
      <title>Hi Anton,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123387#M5510</link>
      <description>&lt;P&gt;Hi Anton,&lt;/P&gt;

&lt;P&gt;There's a misprint in James' command, could you please try:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;mpirun -genvall -genv I_MPI_FABRICS shm:dapl -genv I_MPI_HYDRA_DEBUG 1 -n 16 -machinefile ./nodes IMB-MPI1&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Also could you please provide some details about your environment (Intel MPI version, OS, OFED/DAPL versions).&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 08:01:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123387#M5510</guid>
      <dc:creator>Artem_R_Intel1</dc:creator>
      <dc:date>2016-05-12T08:01:11Z</dc:date>
    </item>
    <item>
      <title>Thanks, that command worked.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123388#M5511</link>
      <description>&lt;P&gt;Thanks, that command worked. The output is long, I put it here:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://eis.bris.ac.uk/~mexas/9lap.o4579303" target="_blank"&gt;http://eis.bris.ac.uk/~mexas/9lap.o4579303&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Other info you asked:&lt;/P&gt;

&lt;P&gt;$ mpirun --version&lt;/P&gt;

&lt;P&gt;Intel(R) MPI Library for Linux* OS, Version 5.1.3 Build 20160120 (build id: 14053)&lt;/P&gt;

&lt;P&gt;$ uname -a&lt;BR /&gt;
	Linux newblue2 2.6.32-220.23.1.el6.x86_64 #1 SMP Mon Jun 18 09:58:09 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux&lt;/P&gt;

&lt;P&gt;Regarding "OFED/DAPL versions" - not sure how to find this.&lt;/P&gt;

&lt;P&gt;Is this helpful:&lt;/P&gt;

&lt;P&gt;$ cat /etc/dat.conf&lt;/P&gt;

&lt;P&gt;OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""&lt;BR /&gt;
	OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""&lt;BR /&gt;
	OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 1" ""&lt;BR /&gt;
	OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mthca0 2" ""&lt;BR /&gt;
	OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""&lt;BR /&gt;
	OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""&lt;BR /&gt;
	OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 1" ""&lt;BR /&gt;
	OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ipath0 2" ""&lt;BR /&gt;
	OpenIB-ehca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "ehca0 1" ""&lt;BR /&gt;
	OpenIB-iwarp u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""&lt;BR /&gt;
	OpenIB-cma-roe-eth2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0" ""&lt;BR /&gt;
	OpenIB-cma-roe-eth3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth3 0" ""&lt;BR /&gt;
	OpenIB-scm-roe-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 1" ""&lt;BR /&gt;
	OpenIB-scm-roe-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 "mlx4_0 2" ""&lt;BR /&gt;
	ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""&lt;BR /&gt;
	ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""&lt;BR /&gt;
	ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""&lt;BR /&gt;
	ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""&lt;BR /&gt;
	ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""&lt;BR /&gt;
	ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""&lt;BR /&gt;
	ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""&lt;BR /&gt;
	ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""&lt;BR /&gt;
	ofa-v2-ehca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""&lt;BR /&gt;
	ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""&lt;BR /&gt;
	ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" ""&lt;BR /&gt;
	ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" ""&lt;BR /&gt;
	ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" ""&lt;BR /&gt;
	ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" ""&lt;BR /&gt;
	ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""&lt;BR /&gt;
	ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""&lt;BR /&gt;
	ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""&lt;BR /&gt;
	ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Anton&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 08:21:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123388#M5511</guid>
      <dc:creator>AShte</dc:creator>
      <dc:date>2016-05-12T08:21:09Z</dc:date>
    </item>
    <item>
      <title>I've corrected the command</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123389#M5512</link>
      <description>&lt;P&gt;I've corrected the command line in my previous post, thanks Artem.&lt;/P&gt;

&lt;P&gt;Use the following commands to get OFED and DAPL versions:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;ofed_info
rpm -qa | grep dapl&lt;/PRE&gt;

&lt;P&gt;Since the IMB test ran successfully, try running your program directly.&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;mpirun -genvall -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 16 ./co_back1.x&lt;/PRE&gt;

&lt;P&gt;You can redirect to a file and directly attach the file to your post here.&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 13:58:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123389#M5512</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2016-05-12T13:58:52Z</dc:date>
    </item>
    <item>
      <title>I'm really sorry, but I</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123390#M5513</link>
      <description>&lt;P&gt;I'm really sorry, but I cannot reproduce the error anymore.&lt;/P&gt;

&lt;P&gt;I can now run with 2, 4, 8, 10, 16, 20, 25 and 40 images (MPI processes)&lt;/P&gt;

&lt;P&gt;over 2 16-core nodes.&lt;/P&gt;

&lt;P&gt;Perhaps there was some transient problem.&lt;/P&gt;

&lt;P&gt;I apologise for wasting your time and thank you for valuable debugging hints.&lt;/P&gt;

&lt;P&gt;Anton&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 14:26:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123390#M5513</guid>
      <dc:creator>AShte</dc:creator>
      <dc:date>2016-05-12T14:26:47Z</dc:date>
    </item>
    <item>
      <title>No worries, good to hear that</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123391#M5514</link>
      <description>&lt;P&gt;No worries, good to hear that everything is working now.&amp;nbsp; Let us know if it shows up again.&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 14:29:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-error-src-mpid-ch3-channels-nemesis-netmod-dapl-dapl-poll-rc/m-p/1123391#M5514</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2016-05-12T14:29:10Z</dc:date>
    </item>
  </channel>
</rss>

