Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Suspend an MPI job

jackyjngwn
Beginner
1,865 Views
Hi,

How can I suspend all the processes in an MPI job? I tried to use I_MPI_JOB_SIGNAL_PROPAGATION but it didn't seem to work. I am using Intel MPI 4.0.1.007. Thanks.

Jacky
0 Kudos
7 Replies
Dmitry_K_Intel2
Employee
1,865 Views
Hi Jacky,

Well, I've just check with 4.0.2 and it works.
[dk@cl210 ~]$ export I_MPI_JOB_SIGNAL_PROPAGATION=1
[dk@cl210 ~]$ mpiexec -n 8 IMB-MPI1

In other terminal window:
[dk@cl210 ~]$ ps ux | grep mpiexec
dk 13809 0.1 0.0 140860 9876 pts/11 T 12:06 0:00 python /users/dk/impi/4.0.2/intel64/bin/mpiexec -n 8 IMB-MPI1
[dk@cl210 ~]$ kill -20 13809 (send SIGTSTP)

In the first window you'll see:
[1]+ Stopped mpiexec -n 8 IMB-MPI1

Again in the second window type:
[dk@cl210 ~]$ kill -18 13809 (send SIGCONT)

And IMB is continuing to work.

Is it not your case?

Regards!
Dmitry



0 Kudos
jackyjngwn
Beginner
1,865 Views
Dmitry,


Thanks for your reply. I tried what you did and unfortunately it didn't work in my case. Actually, nothing happened when I used "kill -18" in another terminal window. When I used "Ctrl-Z" in the terminal window where the program was running, only the first process was suspended and all the other processes kept running.

Is this because I am using Intel MPI 4.0.1.007? Or is there anything else I need to configure? Thanks.

Jacky
0 Kudos
Dmitry_K_Intel2
Employee
1,865 Views
Hi Jacky,

I've taken a look into the code of mpiexec and you know you are absolutely right - documentation and reality are not the same. So, SIGTSTP and SIGCONT are not propogated to an application. It can be easily changed, but I doubt that you'll be able to do this.
You can submit a tracker at premier.intel.com and I'll send you a patch for testing.

Regards!
Dmitry
0 Kudos
jackyjngwn
Beginner
1,865 Views

Thanks for the reply. I tried to submit an issue at premier.intel.com, but intel cluster kit is not in my product list. What can I do then? Thanks.

0 Kudos
Dmitry_K_Intel2
Employee
1,865 Views
What's your product? If it is Cluster Toolkit or Cluster Studio you should be able to submit a tracker againt Intel MPI Library for Linux.

Regards!
Dmitry
0 Kudos
jackyjngwn
Beginner
1,865 Views
Dmitry,

I have submitted the issue. Could you please take a look? Thanks.

0 Kudos
Dmitry_K_Intel2
Employee
1,865 Views
Hi Jacky,

Got it - will be working on that.

Regards!
Dmitry
0 Kudos
Reply