<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: &amp;quot;failed to ping mpd&amp;quot; with intel MPI in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905656#M2260</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; If you're expecting the default device to fail, why not specify the device you want? I've run into ssm performing better than fail over.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;OK, so how should I specify tcp/ip? I tried this:&lt;BR /&gt;&lt;BR /&gt;export I_MPI_DEVICE=rdssm:sock&lt;BR /&gt;&lt;BR /&gt;It failed to ping again as before. Is my syntax wrong?&lt;BR /&gt;</description>
    <pubDate>Fri, 31 Jul 2009 20:11:25 GMT</pubDate>
    <dc:creator>sdettrick</dc:creator>
    <dc:date>2009-07-31T20:11:25Z</dc:date>
    <item>
      <title>"failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905654#M2258</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I am sometimes able to run parallel jobs, but very often they fail with errors - most often with:&lt;BR /&gt;&lt;BR /&gt;mpdboot_cl1n052 (handle_mpd_output 575): failed to ping mpd on cl1n038; recvd output={}&lt;BR /&gt;&lt;BR /&gt;but sometimes the error is:&lt;BR /&gt;&lt;BR /&gt;mpdboot_cl1n003 (handle_mpd_output 583): failed to connect to mpd on cl1n040&lt;BR /&gt;&lt;BR /&gt;The node names (cl1nNNN) in the error messages are not always the same, so I suspect it is something systemic.&lt;BR /&gt;&lt;BR /&gt;The mpd commands I use are:&lt;BR /&gt;&lt;BR /&gt;mpdallexit&lt;BR /&gt;mpdboot -n 64 -r ssh -f ${NODEFILE}&lt;BR /&gt;mpdtrace&lt;BR /&gt;mpiexec -np 64 ./a.out&lt;BR /&gt;mpdallexit&lt;BR /&gt;&lt;BR /&gt;Can anyone give a suggestion? I should say that we have tcp and infiniband, but our infiniband is broken at the moment. Typically intel MPI doesn't mind that very much, and fails over to tcp. In case it helps, /etc/hosts is appended to this email.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Sean&lt;BR /&gt;&lt;BR /&gt;127.0.0.1 localhost.localdomain localhost&lt;BR /&gt;10.11.12.7 files.tae.mysite.com files.mysite.com files&lt;BR /&gt;&lt;BR /&gt;# special IPv6 addresses&lt;BR /&gt;::1 localhost ipv6-localhost ipv6-loopback&lt;BR /&gt;&lt;BR /&gt;fe00::0 ipv6-localnet&lt;BR /&gt;&lt;BR /&gt;ff00::0 ipv6-mcastprefix&lt;BR /&gt;ff02::1 ipv6-allnodes&lt;BR /&gt;ff02::2 ipv6-allrouters&lt;BR /&gt;ff02::3 ipv6-allhosts&lt;BR /&gt;#The following was added by scance. Do not remove:&lt;BR /&gt;10.0.1.1 cl1n001&lt;BR /&gt;10.0.1.10 cl1n010&lt;BR /&gt;10.0.1.11 cl1n011&lt;BR /&gt;10.0.1.12 cl1n012&lt;BR /&gt;10.0.1.13 cl1n013&lt;BR /&gt;10.0.1.14 cl1n014&lt;BR /&gt;10.0.1.15 cl1n015&lt;BR /&gt;10.0.1.16 cl1n016&lt;BR /&gt;10.0.1.17 cl1n017&lt;BR /&gt;10.0.1.18 cl1n018&lt;BR /&gt;10.0.1.19 cl1n019&lt;BR /&gt;10.0.1.2 cl1n002&lt;BR /&gt;10.0.1.20 cl1n020&lt;BR /&gt;10.0.1.21 cl1n021&lt;BR /&gt;10.0.1.22 cl1n022&lt;BR /&gt;10.0.1.23 cl1n023&lt;BR /&gt;10.0.1.24 cl1n024&lt;BR /&gt;10.0.1.25 cl1n025&lt;BR /&gt;10.0.1.26 cl1n026&lt;BR /&gt;10.0.1.27 cl1n027&lt;BR /&gt;10.0.1.28 cl1n028&lt;BR /&gt;10.0.1.29 cl1n029&lt;BR /&gt;10.0.1.3 cl1n003&lt;BR /&gt;10.0.1.30 cl1n030&lt;BR /&gt;10.0.1.31 cl1n031&lt;BR /&gt;10.0.1.32 cl1n032&lt;BR /&gt;10.0.1.33 cl1n033&lt;BR /&gt;10.0.1.34 cl1n034&lt;BR /&gt;10.0.1.35 cl1n035&lt;BR /&gt;10.0.1.36 cl1n036&lt;BR /&gt;10.0.1.37 cl1n037&lt;BR /&gt;10.0.1.38 cl1n038&lt;BR /&gt;10.0.1.39 cl1n039&lt;BR /&gt;10.0.1.4 cl1n004&lt;BR /&gt;10.0.1.40 cl1n040&lt;BR /&gt;10.0.1.41 cl1n041&lt;BR /&gt;10.0.1.42 cl1n042&lt;BR /&gt;10.0.1.43 cl1n043&lt;BR /&gt;10.0.1.44 cl1n044&lt;BR /&gt;10.0.1.45 cl1n045&lt;BR /&gt;10.0.1.46 cl1n046&lt;BR /&gt;10.0.1.47 cl1n047&lt;BR /&gt;10.0.1.48 cl1n048&lt;BR /&gt;10.0.1.49 cl1n049&lt;BR /&gt;10.0.1.5 cl1n005&lt;BR /&gt;10.0.1.50 cl1n050&lt;BR /&gt;10.0.1.51 cl1n051&lt;BR /&gt;10.0.1.52 cl1n052&lt;BR /&gt;10.0.1.53 cl1n053&lt;BR /&gt;10.0.1.54 cl1n054&lt;BR /&gt;10.0.1.55 cl1n055&lt;BR /&gt;10.0.1.56 cl1n056&lt;BR /&gt;10.0.1.57 cl1n057&lt;BR /&gt;10.0.1.58 cl1n058&lt;BR /&gt;10.0.1.59 cl1n059&lt;BR /&gt;10.0.1.6 cl1n006&lt;BR /&gt;10.0.1.60 cl1n060&lt;BR /&gt;10.0.1.61 cl1n061&lt;BR /&gt;10.0.1.62 cl1n062&lt;BR /&gt;10.0.1.63 cl1n063&lt;BR /&gt;10.0.1.64 cl1n064&lt;BR /&gt;10.0.1.7 cl1n007&lt;BR /&gt;10.0.1.8 cl1n008&lt;BR /&gt;10.0.1.9 cl1n009&lt;BR /&gt;10.0.10.1 taz3.americas.sgi.com taz3&lt;BR /&gt;10.0.40.1 cl1n001-bmc&lt;BR /&gt;10.0.40.10 cl1n010-bmc&lt;BR /&gt;10.0.40.11 cl1n011-bmc&lt;BR /&gt;10.0.40.12 cl1n012-bmc&lt;BR /&gt;10.0.40.13 cl1n013-bmc&lt;BR /&gt;10.0.40.14 cl1n014-bmc&lt;BR /&gt;10.0.40.15 cl1n015-bmc&lt;BR /&gt;10.0.40.16 cl1n016-bmc&lt;BR /&gt;10.0.40.17 cl1n017-bmc&lt;BR /&gt;10.0.40.18 cl1n018-bmc&lt;BR /&gt;10.0.40.19 cl1n019-bmc&lt;BR /&gt;10.0.40.2 cl1n002-bmc&lt;BR /&gt;10.0.40.20 cl1n020-bmc&lt;BR /&gt;10.0.40.21 cl1n021-bmc&lt;BR /&gt;10.0.40.22 cl1n022-bmc&lt;BR /&gt;10.0.40.23 cl1n023-bmc&lt;BR /&gt;10.0.40.24 cl1n024-bmc&lt;BR /&gt;10.0.40.25 cl1n025-bmc&lt;BR /&gt;10.0.40.26 cl1n026-bmc&lt;BR /&gt;10.0.40.27 cl1n027-bmc&lt;BR /&gt;10.0.40.28 cl1n028-bmc&lt;BR /&gt;10.0.40.29 cl1n029-bmc&lt;BR /&gt;10.0.40.3 cl1n003-bmc&lt;BR /&gt;10.0.40.30 cl1n030-bmc&lt;BR /&gt;10.0.40.31 cl1n031-bmc&lt;BR /&gt;10.0.40.32 cl1n032-bmc&lt;BR /&gt;10.0.40.33 cl1n033-bmc&lt;BR /&gt;10.0.40.34 cl1n034-bmc&lt;BR /&gt;10.0.40.35 cl1n035-bmc&lt;BR /&gt;10.0.40.36 cl1n036-bmc&lt;BR /&gt;10.0.40.37 cl1n037-bmc&lt;BR /&gt;10.0.40.38 cl1n038-bmc&lt;BR /&gt;10.0.40.39 cl1n039-bmc&lt;BR /&gt;10.0.40.4 cl1n004-bmc&lt;BR /&gt;10.0.40.40 cl1n040-bmc&lt;BR /&gt;10.0.40.41 cl1n041-bmc&lt;BR /&gt;10.0.40.42 cl1n042-bmc&lt;BR /&gt;10.0.40.43 cl1n043-bmc&lt;BR /&gt;10.0.40.44 cl1n044-bmc&lt;BR /&gt;10.0.40.45 cl1n045-bmc&lt;BR /&gt;10.0.40.46 cl1n046-bmc&lt;BR /&gt;10.0.40.47 cl1n047-bmc&lt;BR /&gt;10.0.40.48 cl1n048-bmc&lt;BR /&gt;10.0.40.49 cl1n049-bmc&lt;BR /&gt;10.0.40.5 cl1n005-bmc&lt;BR /&gt;10.0.40.50 cl1n050-bmc&lt;BR /&gt;10.0.40.51 cl1n051-bmc&lt;BR /&gt;10.0.40.52 cl1n052-bmc&lt;BR /&gt;10.0.40.53 cl1n053-bmc&lt;BR /&gt;10.0.40.54 cl1n054-bmc&lt;BR /&gt;10.0.40.55 cl1n055-bmc&lt;BR /&gt;10.0.40.56 cl1n056-bmc&lt;BR /&gt;10.0.40.57 cl1n057-bmc&lt;BR /&gt;10.0.40.58 cl1n058-bmc&lt;BR /&gt;10.0.40.59 cl1n059-bmc&lt;BR /&gt;10.0.40.6 cl1n006-bmc&lt;BR /&gt;10.0.40.60 cl1n060-bmc&lt;BR /&gt;10.0.40.61 cl1n061-bmc&lt;BR /&gt;10.0.40.62 cl1n062-bmc&lt;BR /&gt;10.0.40.63 cl1n063-bmc&lt;BR /&gt;10.0.40.64 cl1n064-bmc&lt;BR /&gt;10.0.40.7 cl1n007-bmc&lt;BR /&gt;10.0.40.8 cl1n008-bmc&lt;BR /&gt;10.0.40.9 cl1n009-bmc&lt;BR /&gt;10.11.12.9 taz.mysite.com taz&lt;BR /&gt;192.168.10.1 linux.site linux&lt;BR /&gt;#End scance-section&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 31 Jul 2009 18:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905654#M2258</guid>
      <dc:creator>sdettrick</dc:creator>
      <dc:date>2009-07-31T18:39:00Z</dc:date>
    </item>
    <item>
      <title>Re: "failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905655#M2259</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
If you're expecting the default device to fail, why not specify the device you want? I've run into ssm performing better than fail over.&lt;BR /&gt;</description>
      <pubDate>Fri, 31 Jul 2009 19:08:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905655#M2259</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-07-31T19:08:19Z</dc:date>
    </item>
    <item>
      <title>Re: "failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905656#M2260</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; If you're expecting the default device to fail, why not specify the device you want? I've run into ssm performing better than fail over.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;OK, so how should I specify tcp/ip? I tried this:&lt;BR /&gt;&lt;BR /&gt;export I_MPI_DEVICE=rdssm:sock&lt;BR /&gt;&lt;BR /&gt;It failed to ping again as before. Is my syntax wrong?&lt;BR /&gt;</description>
      <pubDate>Fri, 31 Jul 2009 20:11:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905656#M2260</guid>
      <dc:creator>sdettrick</dc:creator>
      <dc:date>2009-07-31T20:11:25Z</dc:date>
    </item>
    <item>
      <title>Re: "failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905657#M2261</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/305577"&gt;sdettrick&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;OK, so how should I specify tcp/ip? I tried this:&lt;BR /&gt;&lt;BR /&gt;export I_MPI_DEVICE=rdssm:sock&lt;BR /&gt;&lt;BR /&gt;It failed to ping again as before. Is my syntax wrong?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
Not being an expert, I thought if you expect rdssm to fail, you would set&lt;BR /&gt;I_MPI_DEVICE=ssm&lt;BR /&gt;</description>
      <pubDate>Fri, 31 Jul 2009 20:27:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905657#M2261</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-07-31T20:27:39Z</dc:date>
    </item>
    <item>
      <title>Re: "failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905658#M2262</link>
      <description>&lt;P&gt;Hi Sean,&lt;/P&gt;
&lt;P&gt;To specify using TCP/IP, you need to set I_MPI_DEVICE=ssm. This will run over sockets across nodes, and using the shm device within a node.&lt;/P&gt;
&lt;P&gt;Additionally, the error you provide could be due to a failed connection with the node, inability to start the mpd daemon on the remote node, etc. Can you verify that you're using the latest version of Intel MPI Library 3.2 Update 1? You can do so by running "mpiexec -V".&lt;/P&gt;
&lt;P&gt;Also, make sure no leftover mpd python processes exist on the nodes. You can do so by running "ps aux | grep mpd". Go ahead and kill any left over mpd.py procs you find.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;~Gergana&lt;/P&gt;</description>
      <pubDate>Fri, 31 Jul 2009 20:27:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905658#M2262</guid>
      <dc:creator>Gergana_S_Intel</dc:creator>
      <dc:date>2009-07-31T20:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: "failed to ping mpd" with intel MPI</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905659#M2263</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/198675"&gt;Gergana Slavova (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;
&lt;P&gt;Also, make sure no leftover mpd python processes exist on the nodes. You can do so by running "ps aux | grep mpd". Go ahead and kill any left over mpd.py procs you find.&lt;/P&gt;
&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
You may simplify the cleanup task by running mpdallexit to close down your mpd, before looking for the rogue python processes.&lt;BR /&gt;</description>
      <pubDate>Sat, 01 Aug 2009 00:03:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905659#M2263</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-08-01T00:03:51Z</dc:date>
    </item>
    <item>
      <title>Maybe it is related selinux</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905660#M2264</link>
      <description>&lt;P&gt;Maybe it is related selinux/firewall. You can stop those services and try again.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Apr 2013 07:28:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/quot-failed-to-ping-mpd-quot-with-intel-MPI/m-p/905660#M2264</guid>
      <dc:creator>Xiangzheng_S_Intel</dc:creator>
      <dc:date>2013-04-17T07:28:39Z</dc:date>
    </item>
  </channel>
</rss>

