- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm using a IntelMPI with PBS.
When I send a SIGTERM signal using qdel to my job mpirun exits immediatly and my program that is called by mpirun has no time to finish its cleanup work.
(I'm using
if [ x$PBS_ENVIRONMENT != x ]; then
trap "" SIGTERM
fi
in my ~/.profile to prevent any shell from exiting when it gets the SIGTERM)
How can I tell IntelMPI's mpirun not to exit on SIGTERM?
Cheers,
Manuel
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - manuels
Hi,
I'm using a IntelMPI with PBS.
When I send a SIGTERM signal using qdel to my job mpirun exits immediatly and my program that is called by mpirun has no time to finish its cleanup work.
(I'm using
if [ x$PBS_ENVIRONMENT != x ]; then
trap "" SIGTERM
fi
in my ~/.profile to prevent any shell from exiting when it gets the SIGTERM)
How can I tell IntelMPI's mpirun not to exit on SIGTERM?
Cheers,
Manuel
I'm using a IntelMPI with PBS.
When I send a SIGTERM signal using qdel to my job mpirun exits immediatly and my program that is called by mpirun has no time to finish its cleanup work.
(I'm using
if [ x$PBS_ENVIRONMENT != x ]; then
trap "" SIGTERM
fi
in my ~/.profile to prevent any shell from exiting when it gets the SIGTERM)
How can I tell IntelMPI's mpirun not to exit on SIGTERM?
Cheers,
Manuel
Hi Manuel,
Thanks for posting here.
Personnally I don't understand why you need to send SIGTERM and execute cleanup code.
Anyway, I've tried to kill mpirun (it was SIGKILL really instead of SIGTERM, but I think it is not so important):
[user1@mpiserver100 spawn1]$ mpirun -r ssh -f mpd.hosts -n 2 IMB-MPI1 > out_IMB
Killed
From another console:
[user1@mpiserver100 spawn1]$ ps xf
PID TTY STAT TIME COMMAND
20989 pts/0 Ss 0:00 -bash
23276 pts/0 R+ 0:00 _ ps xf
14865 pts/6 Ss+ 0:00 -bash
23269 pts/0 S 0:00 python /user1/intel/impi/4.0/intel64/bin/mpiexec -n 2 IMB-MPI1
23270 pts/0 Z 0:00 _ [sh]
23255 ? S 0:00 python /user1/intel/impi/4.0/intel64/bin/mpd.py --ncpus=1 --myhost=mpiserver100 -e -d -s 2
23271 ? S 0:00 _ python /user1/intel/impi/4.0/intel64/bin/mpd.py --ncpus=1 --myhost=mpiserver100 -e -d -s 2
23274 ? R 0:09 | _ IMB-MPI1
23272 ? S 0:00 _ python /user1/intel/impi/4.0/intel64/bin/mpd.py --ncpus=1 --myhost=mpiserver100 -e -d -s 2
23273 ? R 0:09 _ IMB-MPI1
So, you can see that mpiexec and application itself are still running. mpirun doesn't send signals further. Probably this is PBS responsible for the problem you mentioned - seems PBS can kill not only parent processes but all children as well. Could you tell me your version of PBS and I'll try to reproduce the problem.
Best wishes,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the same problem
If I send SIGUSR1 it gets passed to the subproceesses they can save there state and shutdown cleanly.
If I send a SIGINT (Ctrl-C) mpirun exits and my processes get killed without being able to save state. How do I make mpirun signore all signals and pass them on to the subprocesses?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page