<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic IntelMPI first execution crashes,  mpd process on remote host d in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805347#M891</link>
    <description>Hi Dmitry,&lt;BR /&gt;&lt;BR /&gt;if first node crashes, I would not be able to know MPD_CON_EXT on other nodes.&lt;BR /&gt;&lt;BR /&gt;mpdcleanup -a may terminates other mpd rings which do not include the first node.&lt;BR /&gt;The key reason of MPD_CON_EXT is to start up individual mpd ring to each applications.&lt;BR /&gt;&lt;BR /&gt;So it seems like mpdcleanup -a may not work for me.&lt;BR /&gt;&lt;BR /&gt;What Iexpect is that mpd ring terminates itself automatically if one mpd in the ring&lt;BR /&gt;stops responding. and each mpd process should exit in a clean way, no left over processes.&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;- Jin</description>
    <pubDate>Fri, 04 Jun 2010 14:31:03 GMT</pubDate>
    <dc:creator>Jin_Ma</dc:creator>
    <dc:date>2010-06-04T14:31:03Z</dc:date>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host does not exit</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805345#M889</link>
      <description>I found during my test that in case the first execution crashes, a mpd.py process on remote host does not exit automatically.&lt;BR /&gt;&lt;BR /&gt;Here're the steps, assume you have two host host1 and host2&lt;BR /&gt;&lt;BR /&gt;1.&lt;BR /&gt;start a mpd ring on the two hosts under normal user:&lt;BR /&gt;&lt;P&gt;export MPD_CON_EXT=1234&lt;/P&gt;&lt;P&gt;mpdboot -n 2 -f $hfile &lt;BR /&gt;&lt;BR /&gt;in which hfile contains two hosts host1 and host2&lt;BR /&gt;&lt;BR /&gt;2. &lt;BR /&gt;on host1, kill -9 the mpd process and all intel mpi process in one shot&lt;BR /&gt;&lt;BR /&gt;3.&lt;BR /&gt;on host2 (remote host), you see a left over mpd.py process&lt;BR /&gt;&lt;BR /&gt;Is there a way to make the mpd ring exit by itself in a clean way?&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;- Jin&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jun 2010 19:48:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805345#M889</guid>
      <dc:creator>Jin_Ma</dc:creator>
      <dc:date>2010-06-03T19:48:33Z</dc:date>
    </item>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host d</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805346#M890</link>
      <description>Hi Jin,&lt;BR /&gt;&lt;BR /&gt;Could you try out a script like:&lt;BR /&gt;&lt;BR /&gt;export MPD_CON_EXT=1234&lt;BR /&gt;mpdboot -n 2 -f $hfile #(might be you need -r ssh)&lt;BR /&gt;mpiexec -n NNN ./my_application&lt;BR /&gt;mpdcleanup -a&lt;BR /&gt;&lt;BR /&gt;I hope it works.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;</description>
      <pubDate>Fri, 04 Jun 2010 05:45:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805346#M890</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2010-06-04T05:45:27Z</dc:date>
    </item>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host d</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805347#M891</link>
      <description>Hi Dmitry,&lt;BR /&gt;&lt;BR /&gt;if first node crashes, I would not be able to know MPD_CON_EXT on other nodes.&lt;BR /&gt;&lt;BR /&gt;mpdcleanup -a may terminates other mpd rings which do not include the first node.&lt;BR /&gt;The key reason of MPD_CON_EXT is to start up individual mpd ring to each applications.&lt;BR /&gt;&lt;BR /&gt;So it seems like mpdcleanup -a may not work for me.&lt;BR /&gt;&lt;BR /&gt;What Iexpect is that mpd ring terminates itself automatically if one mpd in the ring&lt;BR /&gt;stops responding. and each mpd process should exit in a clean way, no left over processes.&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;- Jin</description>
      <pubDate>Fri, 04 Jun 2010 14:31:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805347#M891</guid>
      <dc:creator>Jin_Ma</dc:creator>
      <dc:date>2010-06-04T14:31:03Z</dc:date>
    </item>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host d</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805348#M892</link>
      <description>Hi Jin,&lt;BR /&gt;&lt;BR /&gt;Seems I didn't understand the problem.&lt;BR /&gt;If you need to run each application with individual ring you can use 'mpirun' utility. It uses unique MPD_CON_EXT internally, creates mpd ring and destroys this mpd ring when application is finished. All mpds related to this task will be killed automatically.&lt;BR /&gt;Other mpds will not be affected.&lt;BR /&gt;&lt;BR /&gt;If you create your own MPD_CON_EXT and start new ring using 'mpdboot' mpds will live until you kill them. They will NOT be killed automatically.&lt;BR /&gt;If one mpd stops responding your mpd ring will be one node smaller.&lt;BR /&gt;&lt;BR /&gt;Would you like to change existing behaviour?&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 07 Jun 2010 12:45:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805348#M892</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2010-06-07T12:45:56Z</dc:date>
    </item>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host d</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805349#M893</link>
      <description>Dmitry.&lt;BR /&gt;&lt;BR /&gt;I checked mpirun, it's script and it does the same thing:&lt;BR /&gt;construct a host file, set a unique MPD_CON_EXT, and then call mpdboot command.&lt;BR /&gt;In this case, if the first host crashes (where mpirun is running),&lt;BR /&gt;I'd assume I can still see the same problem I described: left over mpd.py processes&lt;BR /&gt;on other hosts. Is that right? Is that a behavior by design?&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;424 # Start an exclusive MPD ring by setting an unique MPD_CON_EXT variable&lt;/P&gt;&lt;P&gt;425 if [ -n "$ENVIRONMENT" -a -n "$QSUB_REQID" -a -n "$QSUB_NODEINF" ] ; then&lt;/P&gt;&lt;P&gt;426 export MPD_CON_EXT=$QSUB_REQID # Called under Fujitsu NQS&lt;/P&gt;&lt;P&gt;427 else&lt;/P&gt;&lt;P&gt;428 export MPD_CON_EXT=`date +%y%m%d.%H%M%S`&lt;/P&gt;&lt;P&gt;429 fi&lt;/P&gt;&lt;P&gt;430 #echo "mpdboot -n $np_boot $hosts_opt $other_mpdboot_opt"&lt;/P&gt;&lt;P&gt;431 #echo "HOSTFILE:"&lt;/P&gt;&lt;P&gt;432 #cat $hosts_file&lt;/P&gt;&lt;P&gt;433 mpdboot -n $np_boot $hosts_opt $other_mpdboot_opt&lt;/P&gt;&lt;P&gt;434 #mpdtrace&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jun 2010 19:33:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805349#M893</guid>
      <dc:creator>Jin_Ma</dc:creator>
      <dc:date>2010-06-07T19:33:04Z</dc:date>
    </item>
    <item>
      <title>IntelMPI first execution crashes,  mpd process on remote host d</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805350#M894</link>
      <description>Hi Jin,&lt;BR /&gt;&lt;BR /&gt;If you are talking about abnormal termination than 'yes' - mpirun will not be able to stop all mpds and mpds don't know that they need to be killed.&lt;BR /&gt;According to existing logic a node (mpd) can disappear from a ring at any time - no problem.&lt;BR /&gt;&lt;BR /&gt;In version 4.0 of the Intel MPI library there is an experimental Process Manager called Hydra.&lt;BR /&gt;You can run 'mpiexec.hydra' instead of 'mpirun' (all the rest parameters are the same). All processes running on remote nodes should be killed automatically in case of any abnormal termination of the application. Let's try.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 08 Jun 2010 12:46:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/IntelMPI-first-execution-crashes-mpd-process-on-remote-host-does/m-p/805350#M894</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2010-06-08T12:46:32Z</dc:date>
    </item>
  </channel>
</rss>

