Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

Debugging MPI codes with -gtool or -gdb hangs (nothing happens when typing the command)

xsl
Beginner
3,762 Views

Hello,

I am trying to debug my MPI code with mpirun -gdb or mpirun -gtool. Either way, after typing in the command, nothing comes up and the terminal just freezes unless terminated using Ctrl+c.

 

For example, I use the following command trying to debug my code with -gtool option

mpirun -n 3 -gtool "gdb:0,1=attach" ./MY_EXECUTABLE

Then, nothing happens at all.

 

I am trying to debug this code on my desktop with 6 CPUs (12 with HT on), and I have already run 

export OMP_NUM_THREADS=4

BTW, when running 

which mpirun

here is what I got

/opt/intel/oneapi/mpi/2021.1.1//bin/mpirun

Why there are two slashes (//) before bin? I just put the following line in my .zhsrc file (I am using zsh):

source /opt/intel/oneapi/setvars.sh

 

Can anyone tell me what is the problem? Thanks very much.

0 Kudos
20 Replies
SantoshY_Intel
Moderator
3,745 Views

Hi,


Thanks for reaching out to us.


We are able to reproduce the issue at our end. We are working on it and will get back to you soon.


Thanks & Regards,

Santosh


0 Kudos
DrAmarpal_K_Intel
3,717 Views

Hi xsl,


Can you please check if the following command returns anything?

$ which gdb


My guess is that GDB is not installed on your system or is not available through path/s in your PATH environment variable. If you know where gdb exists please use the full path to gdb in your command line.


The following expected behavior comes up with Intel MPI Library + GDB,

$ mpirun -n 3 -bootstrap ssh -gtool "gdb:0,1=attach" IMB-MPI1

mpigdb: attaching to 2244710 IMB-MPI1 epb801

mpigdb: attaching to 2244711 IMB-MPI1 epb801

mpigdb: gdb won't attach to a process with not specified rank

[0,1] (mpigdb) bt

[0]   #0 0x000014c2c9101805 in read () from /lib64/libc.so.6

[1]   #0 0x000014e98d2fb805 in read () from /lib64/libc.so.6

[0]   #1 0x000014c2ca3e509f in read (__fd=<optimized out>, __buf=<optimized out>,

[1]   #1 0x000014e98e5df09f in read (__fd=<optimized out>, __buf=<optimized out>,

....

....

....

....


Best regards,

Amar


0 Kudos
xsl
Beginner
3,703 Views

Hi Amar,

 

Yes, you are right. It turns out that I did not have gdb installed. Previously I only used the one comes with Intel (something like gdb-ib) for debugging my sequential codes. Now I have the new oneAPI thing installed and I have not used it. 

 

I now get 

mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb)

with mpirun -n 3 -gtool "gdb:0,1=attach" ./executable

But why the deubgger is not executing? I could not move forward with any commands such as 'run' or 'break'.

 

 

0 Kudos
DrAmarpal_K_Intel
3,698 Views

Hi xsl,


Perhaps your application is causing this. Please try running gdb on another MPI application and report your findings.


Best regards,

Amar


0 Kudos
xsl
Beginner
3,678 Views

Hi Amar,

 

I tried with another MPI program and I got the same error: "Cannot access memory and won't attach to a process with not specified rank".

0 Kudos
DrAmarpal_K_Intel
3,669 Views

Hi xsl,


[1] Can you please retest with the IMB-MPI1 binary located in your Intel MPI Library installation, i.e. $I_MPI_ROOT/bin


[2] In addition, can you please share the output from the following commands,


$ which gdb

$ gdb --version


Best regards,

Amar


0 Kudos
xsl
Beginner
3,648 Views

Hi Amar,

 

Thanks for the reply. I tried $I_MPI_ROOT/bin/mpirun and I think things are the same. 

 

'which gdb' gives me '/usr/bin/gdb' and 'gdb --version' returns 

GNU gdb (Uos 8.2.1.1-1+security) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

 

0 Kudos
DrAmarpal_K_Intel
3,639 Views

Hi xsl,


Thanks for confirming. I hope you meant to say, $I_MPI_ROOT/bin/IMB-MPI1 and not $I_MPI_ROOT/bin/mpirun in your last note? Please confirm. If not, please rerun. Just to clarify, in my last comment, I wanted you to run the following command,


$ mpirun -n 3 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1


Assuming that this is what you ran, the application doesn't seem to be causing this issue. Let's therefore test with the Intel Distribution of GDB as well (gdb-oneapi instead of gdb). Can you please run the following command and share your findings,


$ mpirun -n 3 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1


Best regards,

Amar


0 Kudos
xsl
Beginner
3,630 Views

Hi Amar,

 

Thanks very much for your comments. I clearly did not do what you said in the last test. And now I think I did what you said and it seems that it is still not working:

 

➜  dyno_input export OMP_NUM_THREADS=2
➜  dyno_input mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜  dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜  dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno                                  
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) 

0 Kudos
DrAmarpal_K_Intel
3,620 Views

Hi xsl,


Thanks for your note.


You still seem to be running the following command line, which is not what I meant,

mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 ./dyno dyno_ga.inp ../dyno_output/test 1


Please don't try to run ./dyno for this test. Kindly test gdb with IMB-MPI1 alone, using the following command line, which is complete,


$ mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1


Please do not add additional parameters to the above command and run it as it is. Kindly report your findings.


Best regards,

Amar



0 Kudos
xsl
Beginner
3,614 Views

Hi Amar,

 

I am sorry that I missed your point. I think now I am doing what you said:

➜  dyno_input mpirun -n 6 -gtool "gdb:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) 

0 Kudos
DrAmarpal_K_Intel
3,605 Views

Thanks, xsl.


[1] Can you please check with the following command as well?


mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1


[2] Also, can you please share the output from the following command,


mpirun -n 6 $I_MPI_ROOT/bin/IMB_MPI1


[3] Please also help me with what ➜ dyno_input means. Is it just your prompt or are you running these commands in a custom environment?


[4] Can you please also share the output of the following command,

ps -p $$



0 Kudos
xsl
Beginner
3,559 Views

Hi Amar,

 

Thanks very much for your comments. That dyno_input is just something printed by my zsh. Here is what I got for running all the commands as you gave:

➜  ~ mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB_MPI1
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜  ~ mpirun -n 6 $I_MPI_ROOT/bin/IMB_MPI1
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
[proxy:0:0@Taishan] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:145): execvp error on file /opt/intel/oneapi/mpi/2021.1.1/bin/IMB_MPI1 (No such file or directory)
➜  ~ ps -p $$
  PID TTY          TIME CMD
12667 pts/4    00:00:00 zsh

0 Kudos
DrAmarpal_K_Intel
3,552 Views

Hi xsl,


Thanks for reporting your findings. There was a typo in my last email. The correct binary name is IMB-MPI1 and not IMB_MPI1.


Please rerun the following commands, and report your findings


  1. mpirun -n 6 $I_MPI_ROOT/bin/IMB-MPI1 allreduce
  2. mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 allreduce
  3. If the above fail, could you please run gdb on a non-MPI application and report if gdb attaches as expected?


Many thanks,

Amar


0 Kudos
xsl
Beginner
3,541 Views

Hi Amar,

 

Thanks very much for your comments. Here is the output of what you listed. How can I attach a non-MPI program with gdb? I just run gdb with that program as the argument and the result is also attached (that program is non-MPI although it has the same name).

➜  dyno_input mpirun -n 6 $I_MPI_ROOT/bin/IMB-MPI1 allreduce
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.1, MPI-1 part    
#------------------------------------------------------------
# Date                  : Mon Jun  7 10:04:33 2021
# Machine               : x86_64
# System                : Linux
# Release               : 5.4.70-amd64-desktop
# Version               : #2 SMP Wed Jan 6 13:39:30 CST 2021
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# /opt/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1 allreduce 

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE 
# MPI_Datatype for reductions    :   MPI_FLOAT 
# MPI_Op                         :   MPI_SUM  
# 
# 

# List of Benchmarks to run:

# Allreduce

#----------------------------------------------------------------
# Benchmarking Allreduce 
# #processes = 2 
# ( 4 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.03         0.03         0.03
            4         1000         0.87         0.87         0.87
            8         1000         0.87         0.87         0.87
           16         1000         0.91         0.92         0.91
           32         1000         0.86         0.88         0.87
           64         1000         0.87         0.89         0.88
          128         1000         0.87         0.92         0.89
          256         1000         0.90         0.96         0.93
          512         1000         1.02         1.07         1.04
         1024         1000         1.10         1.13         1.12
         2048         1000         1.25         1.31         1.28
         4096         1000         1.54         1.59         1.56
         8192         1000         2.14         2.22         2.18
        16384         1000         2.96         3.07         3.02
        32768         1000         4.80         4.95         4.87
        65536          640         8.43         8.56         8.49
       131072          320        15.22        15.37        15.30
       262144          160        29.79        29.98        29.88
       524288           80        56.65        56.79        56.72
      1048576           40       108.10       108.71       108.40
      2097152           20       259.35       266.87       263.11
      4194304           10       788.20       789.28       788.74

#----------------------------------------------------------------
# Benchmarking Allreduce 
# #processes = 4 
# ( 2 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.03         0.03         0.03
            4         1000         1.63         1.66         1.65
            8         1000         0.31         0.36         0.32
           16         1000         1.65         1.69         1.67
           32         1000         1.65         1.70         1.68
           64         1000         1.65         1.72         1.69
          128         1000         1.68         1.71         1.70
          256         1000         1.71         1.75         1.73
          512         1000         1.94         1.98         1.97
         1024         1000         2.10         2.14         2.12
         2048         1000         2.35         2.49         2.43
         4096         1000         2.87         3.03         2.94
         8192         1000         4.91         4.99         4.94
        16384         1000         6.41         6.55         6.45
        32768         1000         9.06         9.23         9.14
        65536          640        13.71        13.90        13.80
       131072          320        23.93        24.52        24.31
       262144          160        46.32        47.76        47.35
       524288           80        87.72        90.52        89.58
      1048576           40       195.61       209.72       203.44
      2097152           20      1109.84      1112.89      1111.39
      4194304           10      2330.50      2404.74      2369.87

#----------------------------------------------------------------
# Benchmarking Allreduce 
# #processes = 6 
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.03         0.03         0.03
            4         1000         2.24         2.99         2.54
            8         1000         2.25         2.99         2.56
           16         1000         2.26         2.98         2.54
           32         1000         2.26         3.00         2.56
           64         1000         2.23         3.00         2.55
          128         1000         2.19         3.02         2.53
          256         1000         2.22         3.03         2.56
          512         1000         2.53         3.19         2.86
         1024         1000         3.16         4.81         4.03
         2048         1000         3.54         5.40         4.54
         4096         1000         3.73         5.09         4.38
         8192         1000         6.38         8.07         7.10
        16384         1000         8.61        11.02         9.70
        32768         1000        12.60        16.16        14.33
        65536          640        22.94        29.22        26.36
       131072          320        42.30        54.26        49.43
       262144          160        83.97       107.11        98.26
       524288           80       168.37       213.34       196.98
      1048576           40       585.68       727.82       677.37
      2097152           20      2397.94      2746.91      2620.74
      4194304           10      5380.48      6099.55      5866.43


# All processes entering MPI_Finalize

➜  dyno_input mpirun -n 6 -gtool "gdb-oneapi:0,1=attach" $I_MPI_ROOT/bin/IMB-MPI1 allreduce
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [0]
mpigdb: attaching to Cannot access memory
mpigdb: hangup detected: while read from [1]
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
mpigdb: gdb won't attach to a process with not specified rank
[0,1] (mpigdb) ^Cmpigdb: ending..
mpigdb: kill Cannot
mpigdb: kill Cannot
[mpiexec@Taishan] Sending Ctrl-C to processes as requested
[mpiexec@Taishan] Press Ctrl-C again to force abort
^C
➜  dyno_input cp /home/xsl/work/svn/peter/trunk/exe/debug/dyno ./      
cp: overwrite './dyno'? y
'/home/xsl/work/svn/peter/trunk/exe/debug/dyno' -> './dyno'
➜  dyno_input ls
background_physical_property.txt  data.inp  dyno_ga.inp  fox_par.txt          inversion_sig.node  mesh_input   optim_ga.inp  rect_reg.txt      surf.inp
bgMesh.inp                        dyno      emdata.inp   inversion_data.node  invMeshDivReg.txt   obj_log.txt  prop.inp      surface_mesh.inp
➜  dyno_input gdb dyno 
GNU gdb (Uos 8.2.1.1-1+security) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from dyno...done.
(gdb) q

0 Kudos
DrAmarpal_K_Intel
3,523 Views

Hi xsl,


Thanks for sharing the requested details. gdb ./dyno would be the invocation mechanism for non-MPI applications, which you have already done.


Can you try upgrading your version of GDB? Please also check if this is a known limitation of the version of GDB shipped with your OS distribution.


Please also share the output of,

$ cat /proc/version

$ cat /proc/os-release


Best regards,

Amar


0 Kudos
xsl
Beginner
3,513 Views

Hi Amar,

 

I do not know how to upgrade my GDB and I do not know anything about the possible limitation. 

 

I am running a Deepin OS (v20) which I believe is derived from Debian 9. Here is what I got for running 'cat /proc/version':

Linux version 5.4.70-amd64-desktop (deepin@deepin-PC) (gcc version 8.3.0 (Uos 8.3.0.3-3+rebuild)) #2 SMP Wed Jan 6 13:39:30 CST 2021

There is no /proc/os-release file.

 

If this is indeed causing the problem, then I might need to install another system.

 

0 Kudos
DrAmarpal_K_Intel
3,510 Views

Hi xsl,


Currently, Intel MPI Library supports the following OS distributions -

  • Red Hat* Enterprise Linux* 7, 8
  • Fedora* 31
  • CentOS* 7, 8
  • SUSE* Linux Enterprise Server* 12, 15
  • Ubuntu* LTS 16.04, 18.04, 20.04
  • Debian* 9, 10
  • Amazon Linux 2


See https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-release-notes-linux.html for more details.


There is also the possibility of attaching to a running PID in GDB. If this may serve your requirements, you may also try this approach. There is no guarantee that this will work though. Section 20.2.2 in the following link shows the procedure,

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/developer_guide/debugging-running-application


Please let me know if you have further questions.


Best regards,

Amar


0 Kudos
DrAmarpal_K_Intel
3,484 Views

Hi xsl,


Is there anything else I can help you with before closing this thread?


Best regards,

Amar


0 Kudos
DrAmarpal_K_Intel
3,334 Views

Hi xsl,

 

As the root cause of this issue was identified and the next steps are clear, I am going ahead and closing this thread. We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

 

Happy computing!

 

 

0 Kudos
Reply