- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am having problem destroying Intel MPI program, the original problem is described at this thread.
I am using impi/5.0.2.044/intel64, and my program is launched with "mpirun -machinefile mymachinefile ./myprogram"
I followed the suggestion to have the runtime executing "kill -<signal> <pid>", but doesn't work for signal 1, 2, 9, 15.
I have tried to used I_MPI_DEBUG=5 and still no pid get printed.
Is there any environmental variable I can use to get ALL pids related to the current launch, so I can send a kill signal to each process?
Or is there any setting that will ensure signals like keyboard interrupt will propagate to processes related?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kin Fai,
You can try 'mpirun -cleanup' or I_MPI_HYDRA_CLEANUP environment variable. With this option the list of MPI processes is saved into a file. Then the processes can be cleaned up with 'mpicleanup' utility. I suppose there may be some limitations for spawned processes. See Intel® MPI Library for Linux* OS Reference Manual for details.
Regarding to:
Or is there any setting that will ensure signals like keyboard interrupt will propagate to processes related?
As far as I know it should be propagated by default. Do you have any problems with the propagation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding to the incorrect signal propagation - I've reproduced this for mpirun. I'll submit an internal ticket to fix this.
You can use mpiexec.hydra launcher instead of mpirun - there should be correct signal propagation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For your reference, I have the following toy code in java you can check. I've also noticed the same issue when using python.
The returned Process from the runtime is actually a handle to something like "sh /$PATH_TO_MPIRUN/mpirun -np 6 ./a.out", so p.destroy(); actually destroy the shell, and mpirun may not be noticed of it. However, in the case of pressing CTRL+C in a terminal running mpirun, the interrupt can be propagated.
Anyway, thanks for your help, and I've solved my problem with the working workaround shown.
package test;
import java.io.IOException;
import java.lang.reflect.Field;
public class TestMain {
public static void main(String[] args) throws IOException, InterruptedException {
Process p = Runtime.getRuntime().exec("mpirun -cleanup -tmpdir ./ -np 6 ./a.out");
int pid = 0;
if (p.getClass().getName().equals("java.lang.UNIXProcess")) {
try {
Field f = p.getClass().getDeclaredField("pid");
f.setAccessible(true);
System.out.println(pid = f.getInt(p));
} catch (Throwable e) {
}
}
Thread.sleep(10000);
// p.destroy(); // Dosen't work
Runtime.getRuntime().exec("mpicleanup -i mpiexec_kftse_" + pid + ".log"); // worked
}
}
#include <mpi.h>
int main(int argc, char* argv[]){
MPI_Init(&argc, &argv);
while( true ){
}
MPI_Finalize();
}
Best,
Kin Fai
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page