- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to debug my MPI application. This has been built on CentOS 6.7 using Intel MPI libraries (version info below).
I am trying run gdb in xterm (one each for each process). However, I am getting errors when the application calls MPI_INIT(). To invoke the run, I execute as
$ mpirun -np <N> xterm -e gdb --args <application along with arguments>
However, I get below errors (pasted below) for one of the processes. Interestingly, regardless of number of processes I run, this error always occurs in process with rank 2. The application runs successfully, if I run without gdb "mpirun -np <N> <application with arguments>"
I am looking for help to try to figure out how t make it run. I am trying to get my application to move away from OpenMPI to Intel MPI, but this is a critical piece that needs to work for us to adapt. The total number of ranks will typically be a small number for us (< 16), so it is manageable using xterm. In fact, finally, we will bring it up on gdb under emacs, which provides a much better debugging experience.
Appreciate any help that we can get.
Thanks,
Vipul
[cli_2]: write_line error; fd=17 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_2]: Unable to write to PMI_fd
[cli_2]: write_line error; fd=17 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Abort(1091087) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136):
MPID_Init(709).......:
MPIR_pmi_init(105)...: PMI_Get_appnum returned -1
[cli_2]: write_line error; fd=17 buf=:cmd=abort exitcode=1091087
:
system msg for write_line failure : Bad file descriptor
Intel(R) MPI Library for Linux* OS, Version 2019 Update 7 Build 20200312 (id: 5dc2dd3e9)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vipul,
Thanks for reaching out to us.
We have tried to replicate the issue with our sample program and found no error.
Since you are getting Bad file descriptor error only while debugging with gdb, can you once check with -gtool for debugging.
mpirun -n 16 -gtool "gdb:3,5,7-9=attach" ./myprog
Also, have you got a similar error while using gdb without xterm?
mpirun -gdb -n 4 ./test
For details on how to use -gtool and gdb please refer:
Is it possible for you to provide the code or a sample reproducer so we can test from our side?
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
Thanks for your response.
Yes, I am able to run the application in gdb using -gdb option of mpirun. However, it gives the below error when I try to run the gdb in a separate window.
I am able to reproduce this problem with a simple c test (ring example from openmpi). As I mentioned earlier, strangely, this error occurs only in process with rank 2.
I have attached the C application code for your reference. I run as
$ mpirun -n 5 xterm -e gdb --args /med/d/vipulk/sandboxes/try/ring_c 10
The application is compiled using gcc 6.2 with below command:
/med/build/gcc/gcc-6.2.0/rhel6/bin/gcc -I<>/INTELMPI/compilers_and_libraries_2020.1.217/linux/mpi/intel64/include /med/d/vipulk/sandboxes/intelmpi/ring_c.c -o /med/d/vipulk/sandboxes/try/ring_c -L<>/INTELMPI/compilers_and_libraries_2020.1.217/linux/mpi/intel64/lib/release -lmpi
Thanks,
Vipul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way/option for mpirun to not capture stdout/stderr of application execution?
I am thinking that the issue could be that in my execution, the stdout of the application has been captured by gdb, but maybe, mpirun is also trying to capture the same (which causes the problem).
I have been trying to work around this issue by running mpirun separately and then attaching the application process to my gdb independently. This works, however, the stdout/stderr still gets captured by mpirun, I am not able to see the stdout/stderr in my gdb execution.
Thanks,
Vipul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vipul,
We have tried with the ring program you have provided and the program ran without any errors in xterm gdb.
I have attached the screenshot of process 2 running.
Also, we want to know is there any specific use case to use gdb in the external window?
I am not aware of any such option to not capture stdout/stderr by mpirun which you were asking. I am doubtful whether it is possible or not.
Could you post your command line or screenshots of what leading you to the file descriptor error?
Thanks
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vipul,
Is your problem resolved? if not please update us.
Also, can you provide us the command line of yours to reproduce the error, as we were not getting the error you have reported with the same program?
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
Sorry about a late reply.
I am using the command 'mpirun -n 7 xterm -e gdb --args ring_c 10' to run the application under gdb.
I didn't reply earlier because I was having trouble reproducing the issue. It turns out that by chance I was running on a different host with a different gdb in path and then issue didn't reproduce.
The gdb that seems to be working happens to be version 7.6.1-114.el7 (which is /bin/gdb on that host with OS CentOS 7.6.1810). On other machines, I have attempted to use newer gdb version (8.2), but it gives the same problem (process with PMI_RANK 2 fails in MPI_INIT function).
So, it appears to be related to gdb version (or may be some other configuration that I cannot yet understand). Do you have ideas that I could try?
Thanks,
Vipul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vipul,
We have tested the code you have provided with xterm gdb 8.2 multiple times but did not get the error.
This seems like a problem from GDB and not Intel MPI.
We are transferring you query to the Subject matter experts for better suggestions.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
Thanks for trying this out. I have not been able to understand the cause so far with my experiments.
So, please do let me know if you find anything. It is also possible that there is a setup/configuration issue here. But I cannot figure out what.
Thanks,
Vipul
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page