<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: mpdboot fails to start nodes with different users. in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907456#M2283</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/448532"&gt;tahgroupiastate.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Cleared the entries and had them regenerated&lt;BR /&gt;&lt;BR /&gt;The problem still persists&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Could you try to figure out the problem by using mpdcheck and mpdringtest?&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 23 Oct 2009 05:06:18 GMT</pubDate>
    <dc:creator>Dmitry_K_Intel2</dc:creator>
    <dc:date>2009-10-23T05:06:18Z</dc:date>
    <item>
      <title>mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907451#M2278</link>
      <description>I am trying to figure out why a few nodes in my cluster are acting differently.&lt;BR /&gt;&lt;BR /&gt;We are running Rocks 5.2 with RHEL 5&lt;BR /&gt;We use torque/maui as our queing system.&lt;BR /&gt;They submit jobs that use&lt;BR /&gt;MPI version 3.2.1.009&lt;BR /&gt;&lt;BR /&gt;When I start a job as a user with this&lt;BR /&gt;mpdboot --rsh=ssh -d -v -n 16 -f /scr/username/testinput.nodes.mpd&lt;BR /&gt;&lt;BR /&gt;I get the ususal&lt;BR /&gt;---&lt;BR /&gt;LAUNCHED mpd on compute-0-13 via compute-0-15&lt;BR /&gt;debug: launch cmd= ssh -x -n compute-0-13 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE HOST=$HOST OSTYPE=$OSTYPE /opt/intel/impi/3.2.1.009/bin64/mpd.py -h compute-0-15 -p 41983 --ifhn=10.1.3.241 --ncpus=1 --myhost=compute-0-13 --myip=10.1.3.241 -e -d -s 16 &lt;BR /&gt;debug: mpd on compute-0-13 on port 58382&lt;BR /&gt;RUNNING: mpd on compute-0-13&lt;BR /&gt;debug: info for running mpd: {'ip': '10.1.3.241', 'ncpus': 1, 'list_port': 58382, 'entry_port': 41983, 'host': 'compute-0-13', 'entry_host': 'compute-0-15', 'ifhn': '', 'pid': 19147}&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;for most nodes&lt;BR /&gt;however when it gets to here&lt;BR /&gt;&lt;BR /&gt;---&lt;BR /&gt;LAUNCHED mpd on compute-0-6 via compute-0-11&lt;BR /&gt;debug: launch cmd= ssh -x -n compute-0-6 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTTYPE=$HOSTTYPE MACHTYPE=$MACHTYPE HOST=$HOST OSTYPE=$OSTYPE /opt/intel/impi/3.2.1.009/bin64/mpd.py -h compute-0-11 -p 51916 --ifhn=10.1.3.248 --ncpus=1 --myhost=compute-0-6 --myip=10.1.3.248 -e -d -s 16 &lt;BR /&gt;debug: mpd on compute-0-6 on port 47012&lt;BR /&gt;---&lt;BR /&gt;mpdboot_compute-0-15.local (handle_mpd_output 828): Failed to establish a socket connection with compute-0-6:47012 : (111, 'Connection refused')&lt;BR /&gt;mpdboot_compute-0-15.local (handle_mpd_output 845): failed to connect to mpd on compute-0-6&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;I have tried taking compute-0-6 out of the system and it tosses similar errors for compute-0-5 and so forth all the way to compute-0-0&lt;BR /&gt;&lt;BR /&gt;When I run the same job as root&lt;BR /&gt;mpdboot --rsh=ssh -d -v -n 16 -f /scr/username/testinput.nodes.mpd&lt;BR /&gt;it starts fine.&lt;BR /&gt;&lt;BR /&gt;We have ssh set up so that it does not require a password to log in, and I have successfully attemped logging in without password from the mpdboot node without any problems.&lt;BR /&gt;&lt;BR /&gt;I am a relatively new cluster administrator and &lt;BR /&gt;I was hoping someone could help point me towards the solution to this problem&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 15:18:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907451#M2278</guid>
      <dc:creator>tahgroupiastate_edu</dc:creator>
      <dc:date>2009-10-21T15:18:21Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907452#M2279</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Did you check for a bad or stale entry in .ssh/known_hosts for the account?&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 17:33:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907452#M2279</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-10-21T17:33:46Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907453#M2280</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; Did you check for a bad or stale entry in .ssh/known_hosts for the account?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
every node has the same known_hosts file&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 20:08:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907453#M2280</guid>
      <dc:creator>tahgroupiastate_edu</dc:creator>
      <dc:date>2009-10-21T20:08:43Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907454#M2281</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/448532"&gt;tahgroupiastate.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
every node has the same known_hosts file&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
but it has a separate entry for each node. You could check, for example, that ssh is working to the troublesome nodes with that known_hosts file and that account. It's often as simple as removing the bad entries and letting them be regenerated.&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 22:15:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907454#M2281</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-10-21T22:15:52Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907455#M2282</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
but it has a separate entry for each node. You could check, for example, that ssh is working to the troublesome nodes with that known_hosts file and that account. It's often as simple as removing the bad entries and letting them be regenerated.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Cleared the entries and had them regenerated&lt;BR /&gt;&lt;BR /&gt;The problem still persists&lt;BR /&gt;</description>
      <pubDate>Thu, 22 Oct 2009 14:16:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907455#M2282</guid>
      <dc:creator>tahgroupiastate_edu</dc:creator>
      <dc:date>2009-10-22T14:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907456#M2283</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/448532"&gt;tahgroupiastate.edu&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Cleared the entries and had them regenerated&lt;BR /&gt;&lt;BR /&gt;The problem still persists&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Could you try to figure out the problem by using mpdcheck and mpdringtest?&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 23 Oct 2009 05:06:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907456#M2283</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2009-10-23T05:06:18Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907457#M2284</link>
      <description>&lt;P&gt;Hi tahgroup,&lt;/P&gt;
&lt;P&gt;If you believe this might be an issue with the way that ssh is setup for your users on the cluster, you can try using the Expect script we provide with the Intel MPI Library installation. It's called &lt;STRONG&gt;sshconnectivity.exp&lt;/STRONG&gt; and it should be located in the original directory where the contents of the l_mpi_p_3.2.1.009 package were untarred. Of course, you would need to install the &lt;A href="http://expect.nist.gov/" target="_blank"&gt;&lt;CODE&gt;expect&lt;/CODE&gt; software&lt;/A&gt; first in order to run the script.&lt;/P&gt;
&lt;P&gt;If you do decide to go this route, the script would setup secure shell connectivity across the entire cluster for the particular user account for you. To run it, all you have to do is provide a list of hosts:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;$ ./sshconnectivity.exp machines.LINUX&lt;/BLOCKQUOTE&gt;
&lt;P&gt;where &lt;CODE&gt;machines.LINUX&lt;/CODE&gt; contains the hostnames of all nodes on the cluster (including the head node), one per line.&lt;/P&gt;
&lt;P&gt;This is just another option, if you're stuck. Let us know how it goes.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;~Gergana&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2009 00:02:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907457#M2284</guid>
      <dc:creator>Gergana_S_Intel</dc:creator>
      <dc:date>2009-10-27T00:02:20Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907458#M2285</link>
      <description>Sorry It took a while to get back to this.&lt;BR /&gt;I was away form the problem for a bit.&lt;BR /&gt;&lt;BR /&gt;Anyway,&lt;BR /&gt;I ran ./sshconnectivity.exp machines.LINUX&lt;BR /&gt;It reported that all nodes connect properly.&lt;BR /&gt;&lt;BR /&gt;What other things could be causing the problem besides ssh?&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 02 Nov 2009 15:32:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907458#M2285</guid>
      <dc:creator>tahgroupiastate_edu</dc:creator>
      <dc:date>2009-11-02T15:32:44Z</dc:date>
    </item>
    <item>
      <title>Re: mpdboot fails to start nodes with different users.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907459#M2286</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/423452"&gt;Dmitry Kuzmin (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Could you try to figure out the problem by using mpdcheck and mpdringtest?&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I tried this method as the user.&lt;BR /&gt;I have attached the results of the mpdcheck test&lt;BR /&gt;Essentially no errors with the mpdcheck&lt;BR /&gt;mpdringtest wont work till I can get the ring up&lt;BR /&gt;mpdboot still fails with a connection refused error&lt;BR /&gt;</description>
      <pubDate>Tue, 03 Nov 2009 18:57:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/mpdboot-fails-to-start-nodes-with-different-users/m-p/907459#M2286</guid>
      <dc:creator>tahgroupiastate_edu</dc:creator>
      <dc:date>2009-11-03T18:57:09Z</dc:date>
    </item>
  </channel>
</rss>

