<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Gbit / Infiniband mix in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800558#M813</link>
    <description>Hennes,&lt;BR /&gt;&lt;BR /&gt;Have you changed anything on the cluster? Have you changed Intel MPI?&lt;BR /&gt;Intel MPI library should work with ssh, but you need to have passwordless connection. So, from node1 you need to be able to run 'mpiexec -n 1 -host node2 hostname'&lt;BR /&gt;Is it reproducable on other nodes?&lt;BR /&gt;I'm not sure that the issue is related to the library - it seems to me that you need to check eth0 settings.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;</description>
    <pubDate>Fri, 24 Jun 2011 14:30:49 GMT</pubDate>
    <dc:creator>Dmitry_K_Intel2</dc:creator>
    <dc:date>2011-06-24T14:30:49Z</dc:date>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800555#M810</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I am running a heterogeneous cluster, half the nodes Gbit ethernet and the other half Infiniband. For a year or so everything went well, but recently the Gbit nodes complain about the lack of Infiniband (see below). This phaenomenon is limited to impi code, GNU mpi still runs fine. &lt;BR /&gt;The problem appears unrelated to the queuing system, a direct launch fails in the same way as a SGE submitted one.&lt;BR /&gt;Any help would be greatly appreciated.&lt;BR /&gt;...&lt;BR /&gt;compute-0-15.local:19848: open_hca: rdma_bind ERR No such device. Is eth0 configured?&lt;BR /&gt;compute-0-15.local:19847: open_hca: rdma_bind ERR No such device. Is eth0 configured?&lt;BR /&gt;compute-0-15.local:19845: open_hca: getaddr_netdev ERROR: No such device. Is ib1 configured?&lt;BR /&gt;compute-0-15.local:19845: open_hca: device mthca0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device mthca0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device mlx4_0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device mlx4_0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device ipath0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device ipath0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: device ehca0 not found&lt;BR /&gt;compute-0-15.local:19845: open_hca: rdma_bind ERR No such device. Is eth0 configured?&lt;BR /&gt;[cli_0]: got unexpected response to put :cmd=unparseable_msg rc=-1&lt;BR /&gt;:&lt;BR /&gt;[cli_0]: aborting job:&lt;BR /&gt;Fatal error in MPI_Init_thread: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(283): Initialization failed&lt;BR /&gt;MPIDD_Init(98).......: channel initialization failed&lt;BR /&gt;MPIDI_CH3_Init(163)..: generic failure with errno = 336068751&lt;BR /&gt;(unknown)(): Other MPI error&lt;BR /&gt;rank 0 in job 1 xxxxl_51508 caused collective abort of all ranks&lt;BR /&gt; exit status of rank 0: return code 13&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 23 Jun 2011 12:52:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800555#M810</guid>
      <dc:creator>Hennes_Hoffmann</dc:creator>
      <dc:date>2011-06-23T12:52:21Z</dc:date>
    </item>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800556#M811</link>
      <description>Hi Hennes,&lt;BR /&gt;&lt;BR /&gt;Intel MPI Library does not support heterogeneous environment. You need to add I_MPI_FABRICS=shm:tcp to the list of enviroment variables if you are using 4.x&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;</description>
      <pubDate>Fri, 24 Jun 2011 07:06:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800556#M811</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2011-06-24T07:06:01Z</dc:date>
    </item>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800557#M812</link>
      <description>Hi Dmitry,&lt;BR /&gt;&lt;BR /&gt;Thanks for your reply. I should clarify that the different fabrics have their own queues. The error posted above shows up when a job is submitted to Gbit-only nodes. Strangely everything worked well for one year. A recent reboot of the head node broke it, but now it seems impossible to figure which update in particular is causing this. Is there a point in trying softiwarp on the gbit nodes?&lt;BR /&gt;-env I_MPI_FABRICS shm:tcp seems not to work with -r ssh. Is there a way to make it work?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Hennes</description>
      <pubDate>Fri, 24 Jun 2011 11:50:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800557#M812</guid>
      <dc:creator>Hennes_Hoffmann</dc:creator>
      <dc:date>2011-06-24T11:50:56Z</dc:date>
    </item>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800558#M813</link>
      <description>Hennes,&lt;BR /&gt;&lt;BR /&gt;Have you changed anything on the cluster? Have you changed Intel MPI?&lt;BR /&gt;Intel MPI library should work with ssh, but you need to have passwordless connection. So, from node1 you need to be able to run 'mpiexec -n 1 -host node2 hostname'&lt;BR /&gt;Is it reproducable on other nodes?&lt;BR /&gt;I'm not sure that the issue is related to the library - it seems to me that you need to check eth0 settings.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;</description>
      <pubDate>Fri, 24 Jun 2011 14:30:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800558#M813</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2011-06-24T14:30:49Z</dc:date>
    </item>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800559#M814</link>
      <description>Dmitry,&lt;BR /&gt;&lt;BR /&gt;The cluster remained unchanged (except RHEL5.5 updates on the head), Intel MPI is 4.0.0.028, installed about 12 months ago and left untouched since then. Passwordless connection works from and to all nodes. The eth0 settings appear valid to me and gnu MPI is running fine on all nodes. ssh as such works, just with "-env I_MPI_FABRICS shm:tcp" it breaks. I noticed this last already year when I did some unrelated tests. &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Hennes</description>
      <pubDate>Fri, 24 Jun 2011 17:33:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800559#M814</guid>
      <dc:creator>Hennes_Hoffmann</dc:creator>
      <dc:date>2011-06-24T17:33:08Z</dc:date>
    </item>
    <item>
      <title>Gbit / Infiniband mix</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800560#M815</link>
      <description>Hi Hennes,&lt;BR /&gt;Could you submit a ticket at premier.intel.com and please attach the output of a run with I_MPI_DEBUG=20&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt; Dmitry&lt;BR /&gt;</description>
      <pubDate>Mon, 27 Jun 2011 08:10:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Gbit-Infiniband-mix/m-p/800560#M815</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2011-06-27T08:10:22Z</dc:date>
    </item>
  </channel>
</rss>

