Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

How to kill MPI program programatically

Kin_Fai_T_
Beginner
2,054 Views

Dear all,

I am having problem destroying Intel MPI program, the original problem is described at this thread.

http://stackoverflow.com/questions/32222878/intel-mpi-mpirun-does-not-terminate-using-java-process-destroy

I am using impi/5.0.2.044/intel64, and my program is launched with "mpirun -machinefile mymachinefile ./myprogram"

I followed the suggestion to have the runtime executing "kill -<signal> <pid>", but doesn't work for signal 1, 2, 9, 15.

 

I have tried to used I_MPI_DEBUG=5 and still no pid get printed.

Is there any environmental variable I can use to get ALL pids related to the current launch, so I can send a kill signal to each process? 

Or is there any setting that will ensure signals like keyboard interrupt will propagate to processes related?

0 Kudos
3 Replies
Artem_R_Intel1
Employee
2,054 Views

Hello Kin Fai,

You can try 'mpirun -cleanup' or I_MPI_HYDRA_CLEANUP environment variable. With this option the list of MPI processes is saved into a file. Then the processes can be cleaned up with 'mpicleanup' utility. I suppose there may be some limitations for spawned processes. See Intel® MPI Library for Linux* OS Reference Manual for details.

Regarding to:

Or is there any setting that will ensure signals like keyboard interrupt will propagate to processes related?

As far as I know it should be propagated by default. Do you have any problems with the propagation?

0 Kudos
Artem_R_Intel1
Employee
2,054 Views

Regarding to the incorrect signal propagation - I've reproduced this for mpirun. I'll submit an internal ticket to fix this.
You can use mpiexec.hydra launcher instead of mpirun - there should be correct signal propagation.

0 Kudos
Kin_Fai_T_
Beginner
2,054 Views

For your reference, I have the following toy code in java you can check. I've also noticed the same issue when using python.

The returned Process from the runtime is actually a handle to something like "sh /$PATH_TO_MPIRUN/mpirun -np 6 ./a.out", so p.destroy(); actually destroy the shell, and mpirun may not be noticed of it. However, in the case of pressing CTRL+C in a terminal running mpirun, the interrupt can be propagated.

Anyway, thanks for your help, and I've solved my problem with the working workaround shown.

package test;

import java.io.IOException;
import java.lang.reflect.Field;

public class TestMain {

	public static void main(String[] args) throws IOException, InterruptedException {
		Process p = Runtime.getRuntime().exec("mpirun -cleanup -tmpdir ./ -np 6 ./a.out");
		int pid = 0;
		if (p.getClass().getName().equals("java.lang.UNIXProcess")) {
			try {
				Field f = p.getClass().getDeclaredField("pid");
				f.setAccessible(true);
				System.out.println(pid = f.getInt(p));
			} catch (Throwable e) {
			}
		}
		Thread.sleep(10000);
		// p.destroy(); // Dosen't work
		Runtime.getRuntime().exec("mpicleanup -i mpiexec_kftse_" + pid + ".log"); // worked
	}
}
#include <mpi.h>

int main(int argc, char* argv[]){
 MPI_Init(&argc, &argv);
 while( true ){
 }
 MPI_Finalize();
}

Best,

Kin Fai

0 Kudos
Reply