My application makes heavy use of MPI_Comm_spawn calls to dynamically create and abandon processes.
I am using Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522 on a Linux Cluster environment.
Each subsequent call of MPI_Comm_spawn unfortunately leaves a
process behind, even if the subprocess has finished normally. These processes will be killed when the whole application finishes. They do not take in any resources. Since I make about 2000 MPI_Comm_spawn calls, these can become a serious and hard to detect bug if the OS reaches its file handle limit.
Searching the Web gives certain results on the mpich bug tracker, namely ticket 670 and 1504 (spam filter prevents me from posting convenient links) and the mpich discussion board:
Could this still be an issue in the hydra implementation used by intel mpi?
Thank you very much for your help!
Seems this issue still persists for intel MPI 5.0.3.048
are there any workarounds to fix the issue. I'm also spawning lots of mpi processes dynamicall and it will hit the ulimit -u