Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

debugging hung up mpi ranks

psing51
New Contributor I
406 Views

Hi,
I am working with an mpi application which hangs up after ~1 hour of run.

Here is the example of application launch -
mpirun -np 32 -ppn 4  ./wrf.exe

Currently, I am not sure about the line of code where the application hangs up so that i can setup few breakpoints in advance. Also, the compute nodes don't have X11. As the simulation gets stuck after ~1 hour of run and source codebase is very large, it is very difficult to debug this issue with 
mpirun -gdb -np 32 -ppn 4  ./wrf.exe

I have a debug version (-g) of the executables. Is there a way through which i can attach to the hanged up/stuck MPI ranks/processes and check the current source code location of each rank ?

Could you please share the sequence of commands to be executed to analyze the problematic source code location -if  possible?



 

0 Kudos
3 Replies
ShivaniK_Intel
Moderator
376 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please let us know the version of Intel mpi you have been using and OS details?

 

Could you also provide the output using the below command?

 

I_MPI_DEBUG=30 mpirun -check-mpi -np 32 -ppn 4 ./wrf.exe

 

Thanks & Regards

Shivani

 

ShivaniK_Intel
Moderator
341 Views

Hi,

 

As we didn't hear back from you, Could you please provide the details that have been asked in my previous post so that we can investigate more on your issue?

 

Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
315 Views

Hi,


I have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards

Shivani


Reply