<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Issue when using intel MPI through my debugger (mdb) in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1707493#M12187</link>
    <description>&lt;P&gt;Hello I tried the latest version of intel MPI (2021.16.0) and still no luck. Do you have any suggestions.&lt;/P&gt;</description>
    <pubDate>Tue, 05 Aug 2025 08:04:43 GMT</pubDate>
    <dc:creator>tommmm</dc:creator>
    <dc:date>2025-08-05T08:04:43Z</dc:date>
    <item>
      <title>Issue when using intel MPI through my debugger (mdb)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1659550#M12044</link>
      <description>&lt;H3&gt;Background and summary of problem&lt;/H3&gt;&lt;P&gt;I am the developer of a debugging tool called mdb (&lt;A href="https://github.com/TomMelt/mdb?tab=readme-ov-file" target="_blank"&gt;https://github.com/TomMelt/mdb?tab=readme-ov-file&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;This tool is written in Python but it is essentially a wrapper around various debugging backends. It currently works with gdb, cuda-gdb and lldb.&lt;/P&gt;&lt;P&gt;I mostly use openMPI but I have collaborators that use intel oneapi MPI.&lt;/P&gt;&lt;P&gt;When I was testing my tool (mdb) with intel MPI I get a crash when I try to step over the initialization of MPI (when launched with intel MPI's mpirun). E.g.,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MPI_Init(NULL, NULL);&lt;/LI-CODE&gt;&lt;P&gt;The error I get is:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;0:      Continuing.                                                                                                                                                                                                                                                                       
0:      [cli_0]: write_line error; fd=9 buf=:cmd=init pmi_version=1 pmi_subversion=1                                                                                                                                                                                                      
0:      :                                                                                                                                                                                                                                                                                 
0:      system msg for write_line failure : Bad file descriptor                                                                                                                                                                                                                           
0:      [cli_0]: Unable to write to PMI_fd                                                                                                                                                                                                                                                
0:      [cli_0]: write_line error; fd=9 buf=:cmd=get_appnum                                                                                                                                                                                                                               
0:      :                                                                                                                                                                                                                                                                                 
0:      system msg for write_line failure : Bad file descriptor                                                                                                                                                                                                                           
0:      Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:                                                                                                                                                                              
0:      MPIR_Init_thread(176):                                                                                                                                                                                                                                                            
0:      MPID_Init(1439)......:                                                                                                               
0:      MPIR_pmi_init(131)...: PMI_Get_appnum returned -1                                                                                    
0:      [cli_0]: write_line error; fd=9 buf=:cmd=abort exitcode=1090831                                                                      
0:      :                                                                                                                                    
0:      system msg for write_line failure : Bad file descriptor                                                                              
0:                                                                                                                                           
0:      Program received signal SIGSEGV, Segmentation fault.                                                                                 
0:      MPIR_Err_return_comm (comm_ptr=0x7ffff61f3a60 &amp;lt;_IO_stdfile_2_lock&amp;gt;, fcname=0x7fffffff3ff0 "system msg for write_line failure : Bad file descriptor\n", errcode=1090831) at ../../src/mpi/errhan/errutil.c:309&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;For my debug wrapper to work it runs gdb as a subprocess. This seems to work when using openMPI but it fails for intel MPI. I have no idea why. Is there an env variable I could set to make it work?&lt;/P&gt;&lt;P&gt;This was using intel-oneapi-mpi install via spack with the following settings:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;intel-oneapi-mpi@2021.10.0+envmods~external-libfabric~generic-names~ilp64 build_system=generic&lt;/LI-CODE&gt;&lt;P&gt;I am currently investigating it on my local laptop (running Ubuntu 22.04) but I have also tested on our local HPC cluster and it also doesn't work there.&lt;/P&gt;&lt;H3&gt;Steps to re-create error:&lt;/H3&gt;&lt;P&gt;Download and install mdb (optional but you may want to create a venv first):&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;git clone https://github.com/TomMelt/mdb.git
cd mdb
pip install -e .[termgraph]&lt;/LI-CODE&gt;&lt;P&gt;Build sample c++ MPI binary:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;cd examples
mpicxx -g -O0 -c  simple-mpi-cpp.cpp -o simple-mpi-cpp.o&lt;/LI-CODE&gt;&lt;P&gt;From one terminal run the launcher:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mdb launch -n 2 -t simple-mpi-cpp.exe&lt;/LI-CODE&gt;&lt;P&gt;It will output some text, something like:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;running on host: 127.0.1.1
to connect to the debugger run:
mdb attach -h 127.0.1.1 -p 2000

connecting to debuggers ... (2/2)
all debug clients connected&lt;/LI-CODE&gt;&lt;P&gt;In another terminal copy paste the mdb attach command:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mdb attach -h 127.0.1.1 -p 2000&lt;/LI-CODE&gt;&lt;P&gt;You should then be able to step through the code, using the command "command n". This will send the next command ("n") to all processes. The output will look something like:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mdb attach -h 127.0.1.1 -p 2000
mdb - mpi debugger - built on various backends. Type ? for more info. To exit interactive mode type "q", "quit", "Ctrl+D" or "Ctrl+]".
(mdb 0-1) command n
0:      24        var = 0.;
************************************************************************
1:      24        var = 0.;

(mdb 0-1) 
0:      26        MPI_Init(NULL, NULL);
************************************************************************
1:      26        MPI_Init(NULL, NULL);

(mdb 0-1) 
0:      [cli_0]: write_line error; fd=9 buf=:cmd=init pmi_version=1 pmi_subversion=1
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:      [cli_0]: Unable to write to PMI_fd
0:      [cli_0]: write_line error; fd=9 buf=:cmd=get_appnum
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:      Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
0:      MPIR_Init_thread(176): 
0:      MPID_Init(1439)......: 
0:      MPIR_pmi_init(131)...: PMI_Get_appnum returned -1
0:      [cli_0]: write_line error; fd=9 buf=:cmd=abort exitcode=1090831
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:
0:      Program received signal SIGSEGV, Segmentation fault.
0:      MPIR_Err_return_comm (comm_ptr=0x7ffff61f3a60 &amp;lt;_IO_stdfile_2_lock&amp;gt;, fcname=0x7fffffff3fd0 "system msg for write_line failure : Bad file descriptor\n", errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
0:      309     ../../src/mpi/errhan/errutil.c: No such file or directory.
************************************************************************
1:      [cli_1]: write_line error; fd=10 buf=:cmd=init pmi_version=1 pmi_subversion=1
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:      [cli_1]: Unable to write to PMI_fd
1:      [cli_1]: write_line error; fd=10 buf=:cmd=get_appnum
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:      Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
1:      MPIR_Init_thread(176): 
1:      MPID_Init(1439)......: 
1:      MPIR_pmi_init(131)...: PMI_Get_appnum returned -1
1:      [cli_1]: write_line error; fd=10 buf=:cmd=abort exitcode=1090831
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:
1:      Program received signal SIGSEGV, Segmentation fault.
1:      MPIR_Err_return_comm (comm_ptr=0x7ffff61f3a60 &amp;lt;_IO_stdfile_2_lock&amp;gt;, fcname=0x7fffffff3fd0 "system msg for write_line failure : Bad file descriptor\n", errcode=1090831) at ../../src/mpi/errhan/errutil.c:309
1:      309     ../../src/mpi/errhan/errutil.c: No such file or directory.

(mdb 0-1)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Please let me know if you have any suggestions. Thanks for taking the time to read my query and let me know if I can provide any more information.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jan 2025 17:23:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1659550#M12044</guid>
      <dc:creator>tommmm</dc:creator>
      <dc:date>2025-01-22T17:23:13Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using intel MPI through my debugger (mdb)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1661506#M12050</link>
      <description>&lt;P&gt;FWIW, I also tried running with lldb as the backend and it fails too.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To test lldb use the following launch command instead of the one above.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;mdb launch -n 2 -t simple-mpi-cpp.exe -b lldb&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;I get a similar error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$ mdb attach -h 127.0.1.1 -p 2000
mdb - mpi debugger - built on various backends. Type ? for more info. To exit interactive mode type "q", "quit", "Ctrl+D" or "Ctrl+]".
(mdb 0-1) command n
0:      Process 85331 stopped
0:      * thread #1, name = 'simple-mpi-cpp.', stop reason = step over
0:          frame #0: 0x0000555555555368 simple-mpi-cpp.exe`main at simple-mpi-cpp.cpp:26:11
0:         23  
0:         24     var = 0.;
0:         25  
0:      -&amp;gt; 26     MPI_Init(NULL, NULL);
0:                        ^
0:         27     MPI_Comm_size(MPI_COMM_WORLD, &amp;amp;size_of_cluster);
0:         28     MPI_Comm_rank(MPI_COMM_WORLD, &amp;amp;process_rank);
0:         29  
************************************************************************
1:      Process 85332 stopped
1:      * thread #1, name = 'simple-mpi-cpp.', stop reason = step over
1:          frame #0: 0x0000555555555368 simple-mpi-cpp.exe`main at simple-mpi-cpp.cpp:26:11
1:         23  
1:         24     var = 0.;
1:         25  
1:      -&amp;gt; 26     MPI_Init(NULL, NULL);
1:                        ^
1:         27     MPI_Comm_size(MPI_COMM_WORLD, &amp;amp;size_of_cluster);
1:         28     MPI_Comm_rank(MPI_COMM_WORLD, &amp;amp;process_rank);
1:         29  

(mdb 0-1) 
0:      [cli_0]: write_line error; fd=9 buf=:cmd=init pmi_version=1 pmi_subversion=1
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:      [cli_0]: Unable to write to PMI_fd
0:      [cli_0]: write_line error; fd=9 buf=:cmd=get_appnum
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:      Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
0:      MPIR_Init_thread(176): 
0:      MPID_Init(1439)......: 
0:      MPIR_pmi_init(131)...: PMI_Get_appnum returned -1
0:      [cli_0]: write_line error; fd=9 buf=:cmd=abort exitcode=1090831
0:      :
0:      system msg for write_line failure : Bad file descriptor
0:      Process 85331 stopped
0:      * thread #1, name = 'simple-mpi-cpp.', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
0:          frame #0: 0x00007ffff6612be1 libmpi.so.12`MPIR_Err_return_comm(comm_ptr=0x00007ffff61f3a60, fcname="system msg for write_line failure : Bad file descriptor\n", errcode=1090831) at errutil.c:309
************************************************************************
1:      [cli_1]: write_line error; fd=10 buf=:cmd=init pmi_version=1 pmi_subversion=1
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:      [cli_1]: Unable to write to PMI_fd
1:      [cli_1]: write_line error; fd=10 buf=:cmd=get_appnum
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:      Abort(1090831) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
1:      MPIR_Init_thread(176): 
1:      MPID_Init(1439)......: 
1:      MPIR_pmi_init(131)...: PMI_Get_appnum returned -1
1:      [cli_1]: write_line error; fd=10 buf=:cmd=abort exitcode=1090831
1:      :
1:      system msg for write_line failure : Bad file descriptor
1:      Process 85332 stopped
1:      * thread #1, name = 'simple-mpi-cpp.', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
1:          frame #0: 0x00007ffff6612be1 libmpi.so.12`MPIR_Err_return_comm(comm_ptr=0x00007ffff61f3a60, fcname="system msg for write_line failure : Bad file descriptor\n", errcode=1090831) at errutil.c:309&lt;/LI-CODE&gt;</description>
      <pubDate>Wed, 29 Jan 2025 10:29:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1661506#M12050</guid>
      <dc:creator>tommmm</dc:creator>
      <dc:date>2025-01-29T10:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using intel MPI through my debugger (mdb)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1663320#M12060</link>
      <description>&lt;P&gt;FYI, I decided to check mpich as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;mpich initially had the same problem and it appears it is related to &lt;A href="https://github.com/pmodels/mpich/issues/2063" target="_self"&gt;https://github.com/pmodels/mpich/issues/2063.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I managed to get mpich working by adding the flag "&lt;SPAN&gt;--pmi-port".&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;mpich now works, but I cannot find a similar flag for intel mpi. Do you have any ideas?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2025 09:56:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1663320#M12060</guid>
      <dc:creator>tommmm</dc:creator>
      <dc:date>2025-02-05T09:56:04Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using intel MPI through my debugger (mdb)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1692704#M12158</link>
      <description>&lt;P&gt;Do you have any updates on this issue?&lt;/P&gt;</description>
      <pubDate>Tue, 27 May 2025 11:28:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1692704#M12158</guid>
      <dc:creator>tommmm</dc:creator>
      <dc:date>2025-05-27T11:28:07Z</dc:date>
    </item>
    <item>
      <title>Re: Issue when using intel MPI through my debugger (mdb)</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1707493#M12187</link>
      <description>&lt;P&gt;Hello I tried the latest version of intel MPI (2021.16.0) and still no luck. Do you have any suggestions.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Aug 2025 08:04:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-when-using-intel-MPI-through-my-debugger-mdb/m-p/1707493#M12187</guid>
      <dc:creator>tommmm</dc:creator>
      <dc:date>2025-08-05T08:04:43Z</dc:date>
    </item>
  </channel>
</rss>

