Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Suspend an MPI job

jackyjngwn
Beginner
3,803 Views
Hi,

How can I suspend all the processes in an MPI job? I tried to use I_MPI_JOB_SIGNAL_PROPAGATION but it didn't seem to work. I am using Intel MPI 4.0.1.007. Thanks.

Jacky
0 Kudos
7 Replies
Dmitry_K_Intel2
Employee
3,803 Views
Hi Jacky,

Well, I've just check with 4.0.2 and it works.
[dk@cl210 ~]$ export I_MPI_JOB_SIGNAL_PROPAGATION=1
[dk@cl210 ~]$ mpiexec -n 8 IMB-MPI1

In other terminal window:
[dk@cl210 ~]$ ps ux | grep mpiexec
dk 13809 0.1 0.0 140860 9876 pts/11 T 12:06 0:00 python /users/dk/impi/4.0.2/intel64/bin/mpiexec -n 8 IMB-MPI1
[dk@cl210 ~]$ kill -20 13809 (send SIGTSTP)

In the first window you'll see:
[1]+ Stopped mpiexec -n 8 IMB-MPI1

Again in the second window type:
[dk@cl210 ~]$ kill -18 13809 (send SIGCONT)

And IMB is continuing to work.

Is it not your case?

Regards!
Dmitry



0 Kudos
jackyjngwn
Beginner
3,803 Views
Dmitry,


Thanks for your reply. I tried what you did and unfortunately it didn't work in my case. Actually, nothing happened when I used "kill -18" in another terminal window. When I used "Ctrl-Z" in the terminal window where the program was running, only the first process was suspended and all the other processes kept running.

Is this because I am using Intel MPI 4.0.1.007? Or is there anything else I need to configure? Thanks.

Jacky
0 Kudos
Dmitry_K_Intel2
Employee
3,803 Views
Hi Jacky,

I've taken a look into the code of mpiexec and you know you are absolutely right - documentation and reality are not the same. So, SIGTSTP and SIGCONT are not propogated to an application. It can be easily changed, but I doubt that you'll be able to do this.
You can submit a tracker at premier.intel.com and I'll send you a patch for testing.

Regards!
Dmitry
0 Kudos
jackyjngwn
Beginner
3,803 Views

Thanks for the reply. I tried to submit an issue at premier.intel.com, but intel cluster kit is not in my product list. What can I do then? Thanks.

0 Kudos
Dmitry_K_Intel2
Employee
3,803 Views
What's your product? If it is Cluster Toolkit or Cluster Studio you should be able to submit a tracker againt Intel MPI Library for Linux.

Regards!
Dmitry
0 Kudos
jackyjngwn
Beginner
3,803 Views
Dmitry,

I have submitted the issue. Could you please take a look? Thanks.

0 Kudos
Dmitry_K_Intel2
Employee
3,803 Views
Hi Jacky,

Got it - will be working on that.

Regards!
Dmitry
0 Kudos
Reply