- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have a MPI-based Fortran code that can run with single or two processes, however, when lunch the program with more processes, for example, 4 processes, the program crashed with the following message:
forrtl: severe (157): Program Exception - access violation
forrtl: severe (157): Program Exception - access violation
job aborted:
rank: node: exit code[: error message]
0: N01: 123
1: N01: 123
2: n02: 157: process 2 exited without calling finalize
3: n02: 157: process 3 exited without calling finalize
I tried to add print message and mpi_barrier to trace the problem, but still failed. Is there any debug tools or methods to debug the MPI based program? The command lines I run the program is as follows:
mpiexec -wdir "\\N02\Debug\directional\for_debug\mytest" -mapall -hosts 10 n01 2 n02 2 n03 2 n04 2 n05 2 n06 2 n07 2 n08 2 n09 2 n10 2 \\N02\Debug\directional\for_debug\test
Thanks,
Zhanghong Tang
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Further check I found that when I run the mpiexec on another host instead of n01, for example, n10, the program works, or if I run the mpiexec on n01, but the command line is as follows:
mpiexec -wdir "\\N02\Debug\directional\for_debug\mytest" -mapall -hosts 10 n02 2 n01 2 n03 2 n04 2 n05 2 n06 2 n07 2 n08 2 n09 2 n10 2 \\N02\Debug\directional\for_debug\test
The program also works. So it seems that the the problem is related to myid=0, but all hosts used the same work folder, could anyone help me to take a look at it?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
in your case, you should be able to use a core dump to check what's the problem.
More in general, besides the commercial debuggers for parallel applications, there are some free tools that I use regularly to debug MPI-programs:
- strace (from version 4.9): you can get a strack trace of your program at a specific system call with the option -k. Enable it for the function 'exit_group':
strace -k -eexit_group -ostrace.out [my_application]
and it should give you a backtrace at the moment that an MPI-application stops. This is useful if your application stops gracefully (so no core dump), but doesn't tell you where or why it stopped.
- padb: http://padb.pittman.org.uk/. It gives you a 'unified' backtrace of all running MPI-processes. This is especially useful if your MPI-application hangs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr John,
Thank you very much for your kindly reply. I work on Windows 7 system, I don't know whether these two tools you recommended could work on Windows system or not.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
It's unclear whether the code or how you have implemented MPI is the cause. You should be open minded to either.
If your preferred (serial) debugger is Visual Studio (VS), then you can use this to help with debugging.
Presuming you have already integrated your MPI implentation with VS, then launching
mpiexec -n 4 full_VS_Executable_name full_MPI_Executable name
should start 4 instances of VS each running one MPI process. Start each process, one by one in each VS instance, and then off you go.
Yours, Michael
@highendcompute
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ah, I see. Indeed these tools are available on linux- and unix-based systems, so I'm afraid these will not help you. Unless you'd migrate OS, of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
You may try to attach to the problem MPI process with WinDbg.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page