<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SHM failures with one node in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1404177#M9727</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We can observe in your code that there are no wait calls for non-blocking send and receive calls. It is recommended to use the MPI_Wait calls after the ISend and IReceive calls.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And also, we recommend you use 2 or more ranks for launching your MPI Applications.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;I_MPI_FABRICS=shm mpirun -n 2 ./a.out&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please try using the attached sample code where we were able to run the code successfully with the FI Provider as shm/ofi&amp;nbsp;. Please find the below screenshot for more details:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="VarshaS_Intel_0-1659009380758.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/32083iFC66B104FB2FF08B/image-size/medium?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="VarshaS_Intel_0-1659009380758.png" alt="VarshaS_Intel_0-1659009380758.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Varsha&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 28 Jul 2022 11:58:31 GMT</pubDate>
    <dc:creator>VarshaS_Intel</dc:creator>
    <dc:date>2022-07-28T11:58:31Z</dc:date>
    <item>
      <title>SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1394552#M9623</link>
      <description>&lt;P&gt;Hello, I've been having some issues with some test programs failing from seg faults.&lt;/P&gt;
&lt;P&gt;It occurs while using the intel-provided oneapi hpc-kit container (&lt;A href="https://hub.docker.com/r/intel/oneapi-hpckit/)" target="_blank" rel="noopener"&gt;https://hub.docker.com/r/intel/oneapi-hpckit/)&lt;/A&gt; with I_MPI_FABRICS='shm' (seems to be required when running in a container). The failures only occurred on jobs running on one node, on a call to MPI_IRECV.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I noticed this thread had a similar issue:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/Intel-oneAPI-2021-4-SHM-Issue/m-p/1324805" target="_blank" rel="noopener"&gt;https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/Intel-oneAPI-2021-4-SHM-Issue/m-p/1324805&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And it's claimed that the issue is fixed with the new version, but in both the container and when run directly the newest versions give these errors whenver 'shm' is set. Switching to 'ofi' fixes the issue when run directly, but that's not an option in the container. Here's my ifort and mpi versions:&lt;/P&gt;
&lt;P&gt;ifort (IFORT) 2021.6.0 20220226&lt;/P&gt;
&lt;P&gt;Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jun 2022 16:44:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1394552#M9623</guid>
      <dc:creator>rem0</dc:creator>
      <dc:date>2022-06-22T16:44:14Z</dc:date>
    </item>
    <item>
      <title>Re: SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1394834#M9624</link>
      <description>&lt;P&gt;Since this is an MPI question, moving this to the oneAPI HPC Toolkit forum.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Jun 2022 14:39:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1394834#M9624</guid>
      <dc:creator>Barbara_P_Intel</dc:creator>
      <dc:date>2022-06-23T14:39:36Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1395654#M9642</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for posting in Intel Communities.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please provide us with the OS details and CPU details you are using?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;And also, could you please provide us with the complete sample reproducer code along with steps to reproduce the issue at our end?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 27 Jun 2022 12:27:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1395654#M9642</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-06-27T12:27:31Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398197#M9667</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We have not heard back from you. Could you please provide us with all the details mentioned in the previous post?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha &lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 06 Jul 2022 11:40:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398197#M9667</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-07-06T11:40:45Z</dc:date>
    </item>
    <item>
      <title>Re: Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398265#M9669</link>
      <description>&lt;LI-CODE lang="fortran"&gt;program shmbugtest

  use mpi

  implicit none

  integer, allocatable :: get_data(:), put_data(:)
  integer :: ierr
  type(MPI_REQUEST) :: req
  
  call MPI_init(ierr)
   
  allocate(get_data(1), put_data(1))
  put_data(1) = -1

  call MPI_ISEND( put_data, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, req, ierr)
  call MPI_IRECV( get_data, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, req, ierr)

  call MPI_finalize(ierr)

  print *, get_data(1)

end program
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a simple program that shows the bug. Prints '-1' if I_MPI_FABRICS is set to 'ofi', but seg faults with the following when set to 'shm':&lt;/P&gt;
&lt;P&gt;forrtl: severe (71): integer divide by zero&lt;BR /&gt;Image PC Routine Line Source &lt;BR /&gt;a.out 00000000004044BB Unknown Unknown Unknown&lt;BR /&gt;libpthread-2.28.s 00007F0445C3FCE0 Unknown Unknown Unknown&lt;BR /&gt;libmpi.so.12.0.0 00007F04468790A4 Unknown Unknown Unknown&lt;BR /&gt;libmpi.so.12.0.0 00007F0446752035 MPI_Isend Unknown Unknown&lt;BR /&gt;libmpifort.so.12. 00007F0447B8E9F0 PMPI_ISEND Unknown Unknown&lt;BR /&gt;a.out 0000000000403511 MAIN__ 16 shm.F90&lt;BR /&gt;a.out 0000000000403162 Unknown Unknown Unknown&lt;BR /&gt;libc-2.28.so 00007F044551FCF3 __libc_start_main Unknown Unknown&lt;BR /&gt;a.out 000000000040306E Unknown Unknown Unknown&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This happened on both the machines I tested on, centos 8(stream) with AMD EPYC 7742 and centos 7 with Intel Xeon Platinum 8160. It also happens in the docker hub containers (shm seems to be required to use mpirun in the container, thats the main reason this is an issue).&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 15:41:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398265#M9669</guid>
      <dc:creator>rem0</dc:creator>
      <dc:date>2022-07-06T15:41:15Z</dc:date>
    </item>
    <item>
      <title>Re: Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398282#M9670</link>
      <description>&lt;P&gt;Sorry I thought I sent in a reply earlier, looks like it didn't go through.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a short example I wrote up:&lt;/P&gt;
&lt;LI-CODE lang="fortran"&gt;program shmbugtest

  use mpi

  implicit none

  integer, allocatable :: get_data(:), put_data(:)
  integer :: ierr
  type(MPI_REQUEST) :: req
  
  call MPI_init(ierr)
   
  allocate(get_data(1), put_data(1))
  put_data(1) = -1

  call MPI_ISEND( put_data, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, req, ierr)
  call MPI_IRECV( get_data, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, req, ierr)

  call MPI_finalize(ierr)

  print *, get_data(1)

end program
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It'll print and exit successfully when I_MPI_FABRICS is set to 'ofi', but fail and seg fault when set to 'shm'. This happens with the newest compiler/mpi (oneapi 2022.2) on two systems, cent os 8 with AMD EPYC 7742 and centos 7 with a Xeon Platinum 8160. My main issue is running in the container since shm is required to use mpirun.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 16:41:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1398282#M9670</guid>
      <dc:creator>rem0</dc:creator>
      <dc:date>2022-07-06T16:41:09Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1399993#M9685</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for the details and information.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please let us know what steps you have followed after downloading the image? &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha &lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 13 Jul 2022 12:58:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1399993#M9685</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-07-13T12:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1400091#M9687</link>
      <description>&lt;P&gt;To cause the error?Just compile the code above and:&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;export I_MPI_FABRICS=shm
mpirun -n 1 ./a.out&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The issue is not container specific so maybe I shouldn't have mentioned it, it's just where I first encountered it.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2022 20:17:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1400091#M9687</guid>
      <dc:creator>rem0</dc:creator>
      <dc:date>2022-07-13T20:17:24Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1402965#M9717</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are working on your issue. We will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 25 Jul 2022 04:31:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1402965#M9717</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-07-25T04:31:32Z</dc:date>
    </item>
    <item>
      <title>Re: SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1404177#M9727</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We can observe in your code that there are no wait calls for non-blocking send and receive calls. It is recommended to use the MPI_Wait calls after the ISend and IReceive calls.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And also, we recommend you use 2 or more ranks for launching your MPI Applications.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;I_MPI_FABRICS=shm mpirun -n 2 ./a.out&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please try using the attached sample code where we were able to run the code successfully with the FI Provider as shm/ofi&amp;nbsp;. Please find the below screenshot for more details:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="VarshaS_Intel_0-1659009380758.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/32083iFC66B104FB2FF08B/image-size/medium?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="VarshaS_Intel_0-1659009380758.png" alt="VarshaS_Intel_0-1659009380758.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Varsha&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2022 11:58:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1404177#M9727</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-07-28T11:58:31Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1406670#M9746</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We have not heard back from you. Could you please provide an update on your issue?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 08 Aug 2022 06:09:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1406670#M9746</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-08-08T06:09:27Z</dc:date>
    </item>
    <item>
      <title>Re:SHM failures with one node</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1408474#M9752</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We have not heard back from you. This thread will no longer be monitored by Intel. If you need any additional information, please post a new question.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 16 Aug 2022 07:11:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/SHM-failures-with-one-node/m-p/1408474#M9752</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-08-16T07:11:52Z</dc:date>
    </item>
  </channel>
</rss>

