- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to debug my MPI code with mpirun -gdb or mpirun -gtool. Either way, after typing in the command, nothing comes up and the terminal just freezes unless terminated using Ctrl+c.
For example, I use the following command trying to debug my code with -gtool option
mpirun -n 3 -gtool "gdb:0,1=attach" ./MY_EXECUTABLE
Then, nothing happens at all.
I am trying to debug this code on my desktop with 6 CPUs (12 with HT on), and I have already run
export OMP_NUM_THREADS=4
BTW, when running
which mpirun
here is what I got
/opt/intel/oneapi/mpi/2021.1.1//bin/mpirun
Why there are two slashes (//) before bin? I just put the following line in my .zhsrc file (I am using zsh):
source /opt/intel/oneapi/setvars.sh
Can anyone tell me what is the problem? Thanks very much.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are able to reproduce the issue at our end. We are working on it and will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Can you please check if the following command returns anything?
$ which gdb
My guess is that GDB is not installed on your system or is not available through path/s in your PATH environment variable. If you know where gdb exists please use the full path to gdb in your command line.
The following expected behavior comes up with Intel MPI Library + GDB,
$ mpirun -n 3 -bootstrap ssh -gtool "gdb:0,1=attach" IMB-MPI1
mpigdb: attaching to 2244710 IMB-MPI1 epb801
mpigdb: attaching to 2244711 IMB-MPI1 epb801
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) bt
[0] #0 0x000014c2c9101805 in read () from /lib64/libc.so.6
[1] #0 0x000014e98d2fb805 in read () from /lib64/libc.so.6
[0] #1 0x000014c2ca3e509f in read (__fd=<optimized out>, __buf=<optimized out>,
[1] #1 0x000014e98e5df09f in read (__fd=<optimized out>, __buf=<optimized out>,
....
....
....
....
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
Yes, you are right. It turns out that I did not have gdb installed. Previously I only used the one comes with Intel (something like gdb-ib) for debugging my sequential codes. Now I have the new oneAPI thing installed and I have not used it.
I now get
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb)
with mpirun -n 3 -gtool "gdb:0,1=attach" ./executable
But why the deubgger is not executing? I could not move forward with any commands such as 'run' or 'break'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Perhaps your application is causing this. Please try running gdb on another MPI application and report your findings.
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
I tried with another MPI program and I got the same error: "Cannot access memory and won't attach to a process with not specified rank".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
[1] Can you please retest with the IMB-MPI1 binary located in your Intel MPI Library installation, i.e. $I_MPI_ROOT/bin
[2] In addition, can you please share the output from the following commands,
$ which gdb
$ gdb --version
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
Thanks for the reply. I tried $I_MPI_ROOT/bin/mpirun and I think things are the same.
'which gdb' gives me '/usr/bin/gdb' and 'gdb --version' returns
GNU gdb (Uos 8.2.1.1-1+security) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Thanks for confirming. I hope you meant to say, $I_MPI_ROOT/bin/IMB-MPI1 and not $I_MPI_ROOT/bin/mpirun in your last note? Please confirm. If not, please rerun. Just to clarify, in my last comment, I wanted you to run the following command,
$ mpirun -n 3 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1
Assuming that this is what you ran, the application doesn't seem to be causing this issue. Let's therefore test with the Intel Distribution of GDB as well (gdb-oneapi instead of gdb). Can you please run the following command and share your findings,
$ mpirun -n 3 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
Thanks very much for your comments. I clearly did not do what you said in the last test. And now I think I did what you said and it seems that it is still not working:
➜ dyno_input export OMP_NUM_THREADS=2
➜ dyno_input mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜ dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜ dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Thanks for your note.
You still seem to be running the following command line, which is not what I meant,
mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1
Please don't try to run ./dyno for this test. Kindly test gdb with IMB-MPI1 alone, using the following command line, which is complete,
$ mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1
Please do not add additional parameters to the above command and run it as it is. Kindly report your findings.
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
I am sorry that I missed your point. I think now I am doing what you said:
➜ dyno_input mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, xsl.
[1] Can you please check with the following command as well?
mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1
[2] Also, can you please share the output from the following command,
mpirun -n 6 $I_MPI_ROOT/bin/IMB_MPI1
[3] Please also help me with what ➜ dyno_input means. Is it just your prompt or are you running these commands in a custom environment?
[4] Can you please also share the output of the following command,
ps -p $$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
Thanks very much for your comments. That dyno_input is just something printed by my zsh. Here is what I got for running all the commands as you gave:
➜ ~ mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜ ~ mpirun -n 6 $I_MPI_ROOT/bin/IMB_MPI1
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
➜ ~ ps -p $$
PID TTY TIME CMD
12667 pts/4 00:00:00 zsh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Thanks for reporting your findings. There was a typo in my last email. The correct binary name is IMB-MPI1 and not IMB_MPI1.
Please rerun the following commands, and report your findings
- mpirun -n 6 $I_MPI_ROOT/bin/IMB-MPI1 allreduce
- mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 allreduce
- If the above fail, could you please run gdb on a non-MPI application and report if gdb attaches as expected?
Many thanks,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
Thanks very much for your comments. Here is the output of what you listed. How can I attach a non-MPI program with gdb? I just run gdb with that program as the argument and the result is also attached (that program is non-MPI although it has the same name).
➜ dyno_input mpirun -n 6 $I_MPI_ROOT/bin/IMB-MPI1 allreduce
#------------------------------------------------------------
# Intel(R) MPI Benchmarks 2021.1, MPI-1 part
#------------------------------------------------------------
# Date : Mon Jun 7 10:04:33 2021
# Machine : x86_64
# System : Linux
# Release : 5.4.70-amd64-desktop
# Version : #2 SMP Wed Jan 6 13:39:30 CST 2021
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# /opt/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1 allreduce
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Allreduce
#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 2
# ( 4 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.03 0.03 0.03
4 1000 0.87 0.87 0.87
8 1000 0.87 0.87 0.87
16 1000 0.91 0.92 0.91
32 1000 0.86 0.88 0.87
64 1000 0.87 0.89 0.88
128 1000 0.87 0.92 0.89
256 1000 0.90 0.96 0.93
512 1000 1.02 1.07 1.04
1024 1000 1.10 1.13 1.12
2048 1000 1.25 1.31 1.28
4096 1000 1.54 1.59 1.56
8192 1000 2.14 2.22 2.18
16384 1000 2.96 3.07 3.02
32768 1000 4.80 4.95 4.87
65536 640 8.43 8.56 8.49
131072 320 15.22 15.37 15.30
262144 160 29.79 29.98 29.88
524288 80 56.65 56.79 56.72
1048576 40 108.10 108.71 108.40
2097152 20 259.35 266.87 263.11
4194304 10 788.20 789.28 788.74
#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 4
# ( 2 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.03 0.03 0.03
4 1000 1.63 1.66 1.65
8 1000 0.31 0.36 0.32
16 1000 1.65 1.69 1.67
32 1000 1.65 1.70 1.68
64 1000 1.65 1.72 1.69
128 1000 1.68 1.71 1.70
256 1000 1.71 1.75 1.73
512 1000 1.94 1.98 1.97
1024 1000 2.10 2.14 2.12
2048 1000 2.35 2.49 2.43
4096 1000 2.87 3.03 2.94
8192 1000 4.91 4.99 4.94
16384 1000 6.41 6.55 6.45
32768 1000 9.06 9.23 9.14
65536 640 13.71 13.90 13.80
131072 320 23.93 24.52 24.31
262144 160 46.32 47.76 47.35
524288 80 87.72 90.52 89.58
1048576 40 195.61 209.72 203.44
2097152 20 1109.84 1112.89 1111.39
4194304 10 2330.50 2404.74 2369.87
#----------------------------------------------------------------
# Benchmarking Allreduce
# #processes = 6
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.03 0.03 0.03
4 1000 2.24 2.99 2.54
8 1000 2.25 2.99 2.56
16 1000 2.26 2.98 2.54
32 1000 2.26 3.00 2.56
64 1000 2.23 3.00 2.55
128 1000 2.19 3.02 2.53
256 1000 2.22 3.03 2.56
512 1000 2.53 3.19 2.86
1024 1000 3.16 4.81 4.03
2048 1000 3.54 5.40 4.54
4096 1000 3.73 5.09 4.38
8192 1000 6.38 8.07 7.10
16384 1000 8.61 11.02 9.70
32768 1000 12.60 16.16 14.33
65536 640 22.94 29.22 26.36
131072 320 42.30 54.26 49.43
262144 160 83.97 107.11 98.26
524288 80 168.37 213.34 196.98
1048576 40 585.68 727.82 677.37
2097152 20 2397.94 2746.91 2620.74
4194304 10 5380.48 6099.55 5866.43
# All processes entering MPI_Finalize
➜ dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 allreduce
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜ dyno_input cp /home/xsl/work/svn/peter/trunk/exe/debug/dyno ./
cp: overwrite './dyno'? y
'/home/xsl/work/svn/peter/trunk/exe/debug/dyno' -> './dyno'
➜ dyno_input ls
background_physical_property.txt data.inp dyno_ga.inp fox_par.txt inversion_sig.node mesh_input optim_ga.inp rect_reg.txt surf.inp
bgMesh.inp dyno emdata.inp inversion_data.node invMeshDivReg.txt obj_log.txt prop.inp surface_mesh.inp
➜ dyno_input gdb dyno
GNU gdb (Uos 8.2.1.1-1+security) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from dyno...done.
(gdb) q
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Thanks for sharing the requested details. gdb ./dyno would be the invocation mechanism for non-MPI applications, which you have already done.
Can you try upgrading your version of GDB? Please also check if this is a known limitation of the version of GDB shipped with your OS distribution.
Please also share the output of,
$ cat /proc/version
$ cat /proc/os-release
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Amar,
I do not know how to upgrade my GDB and I do not know anything about the possible limitation.
I am running a Deepin OS (v20) which I believe is derived from Debian 9. Here is what I got for running 'cat /proc/version':
Linux version 5.4.70-amd64-desktop (deepin@deepin-PC) (gcc version 8.3.0 (Uos 8.3.0.3-3+rebuild)) #2 SMP Wed Jan 6 13:39:30 CST 2021
There is no /proc/os-release file.
If this is indeed causing the problem, then I might need to install another system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Currently, Intel MPI Library supports the following OS distributions -
- Red Hat* Enterprise Linux* 7, 8
- Fedora* 31
- CentOS* 7, 8
- SUSE* Linux Enterprise Server* 12, 15
- Ubuntu* LTS 16.04, 18.04, 20.04
- Debian* 9, 10
- Amazon Linux 2
See https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-release-notes-linux.html for more details.
There is also the possibility of attaching to a running PID in GDB. If this may serve your requirements, you may also try this approach. There is no guarantee that this will work though. Section 20.2.2 in the following link shows the procedure,
Please let me know if you have further questions.
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
Is there anything else I can help you with before closing this thread?
Best regards,
Amar
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi xsl,
As the root cause of this issue was identified and the next steps are clear, I am going ahead and closing this thread. We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Happy computing!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page