<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Intel MPI RC transport hangs in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1261353#M7873</link>
    <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We haven't heard back from you.&lt;/P&gt;&lt;P&gt;Have you updated to the latest version of MPI?&lt;/P&gt;&lt;P&gt;Let us know if you face any problems while updating.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 04 Mar 2021 09:45:11 GMT</pubDate>
    <dc:creator>PrasanthD_intel</dc:creator>
    <dc:date>2021-03-04T09:45:11Z</dc:date>
    <item>
      <title>Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1258463#M7829</link>
      <description>&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Hello,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;I am facing an issue with an MPI program hanging when using Intel MPI.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Characteristics of the system:&lt;/FONT&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Intel MPI version: 2018 Update 4 Build 20180823&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Network type: Mellanox InfiniBand HDR100&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Network topology: Dragonfly&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;CPU: AMD Epyc 7742&lt;/FONT&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;When I use only 2 nodes (256 processes), the code works fine. But, when I use 8 nodes, the behaviour is random i.e. most of the time it hangs, but sometimes it gives segfault error.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;The stack trace at the time of hanging shows that the processes are stuck at:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;dapl_rc_vc_progress_short_msg_20() at ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_poll_rc.c:483&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;However, if I enable UD transport via export I_MPI_DAPL_UD=on, it works fine. With UD, the code works even on 10k procs.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;My question is: how to know what causes RC (RDMA) to hang (or segfault) the computation? And, how can I fix it?&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;I would prefer to take advantage of RC up to at least 8 nodes, and then for larger runs, I can switch to UD (if needed) to save memory.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Please note that I do not face this problem with Open MPI or MVAPICH2.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Thanks in advance.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Best,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3"&gt;Vineet&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Feb 2021 13:03:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1258463#M7829</guid>
      <dc:creator>vineetsoni</dc:creator>
      <dc:date>2021-02-23T13:03:38Z</dc:date>
    </item>
    <item>
      <title>Re:Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1259591#M7853</link>
      <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The Intel MPI version you were using was old and unsupported now. For the list of supported versions refer &lt;A href="https://software.intel.com/content/www/us/en/develop/articles/intel-parallel-studio-xe-supported-and-unsupported-product-versions.html" rel="noopener noreferrer" target="_blank"&gt;Intel® Parallel Studio XE &amp;amp; Intel® oneAPI Toolkits...&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Since IMPI 2019 the Intel® MPI Library switched from the Open Fabrics Alliance* (OFA) framework to the Open Fabrics Interfaces* (OFI) framework.&lt;/P&gt;&lt;P&gt;Can you upgrade to the latest version? There have been many bug fixes and performance improvements since the 2018 version.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 26 Feb 2021 06:48:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1259591#M7853</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2021-02-26T06:48:05Z</dc:date>
    </item>
    <item>
      <title>Re:Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1261353#M7873</link>
      <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We haven't heard back from you.&lt;/P&gt;&lt;P&gt;Have you updated to the latest version of MPI?&lt;/P&gt;&lt;P&gt;Let us know if you face any problems while updating.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 04 Mar 2021 09:45:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1261353#M7873</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2021-03-04T09:45:11Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1261413#M7875</link>
      <description>&lt;P&gt;Hi Prasanth,&lt;/P&gt;
&lt;P&gt;Unfortunately, I am not the administrator of the machine. So, I do not have control over it. I can try to install the newer version of Intel MPI in my home directory, but it will not be a practical solution as other MPI implementations already work with RC.&lt;/P&gt;
&lt;P&gt;I wanted to know if there are some (hidden) RC-RDMA related environment variables that can help in fixing this issue.&lt;/P&gt;
&lt;P&gt;Anyway, I will ask the system admin to know why they recommend using only UD with Intel MPI as they must have faced the same problem.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Vineet&lt;/P&gt;</description>
      <pubDate>Thu, 04 Mar 2021 13:27:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1261413#M7875</guid>
      <dc:creator>vineetsoni</dc:creator>
      <dc:date>2021-03-04T13:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1263740#M7911</link>
      <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The selection logic for UD or RC depends on the no. of ranks, no. of nodes and the fabric provider being used. Generally, for small-scale IMPI selects RC and for large scale runs it selects UD.&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;Prasanth&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Mar 2021 09:56:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1263740#M7911</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2021-03-18T09:56:25Z</dc:date>
    </item>
    <item>
      <title>Re:Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1265500#M7937</link>
      <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;You can refer to this article (&lt;A href="https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-advanced-techniques.html" rel="noopener noreferrer" target="_blank"&gt;Tuning the Intel® MPI Library: Advanced Techniques&lt;/A&gt;) where it has been explained why UD is selected for large scale runs and how to further tune DAPl for large scale runs.&lt;/P&gt;&lt;P&gt;Let us know if you found this helpful.&lt;/P&gt;&lt;P&gt;If you still want to use RC for larger runs let me know.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 18 Mar 2021 10:04:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1265500#M7937</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2021-03-18T10:04:29Z</dc:date>
    </item>
    <item>
      <title>Re:Intel MPI RC transport hangs</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1267716#M7977</link>
      <description>&lt;P&gt;Hi Vineet,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are closing this thread assuming your issue has been resolved. We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 25 Mar 2021 09:10:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Intel-MPI-RC-transport-hangs/m-p/1267716#M7977</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2021-03-25T09:10:57Z</dc:date>
    </item>
  </channel>
</rss>

