Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2154 Discussions

How the I_MPI_HYDRA_CLEANUP or cleanup is taken care in Intel MPI 2019 U7?

tamilalagan
Novice
1,314 Views

Two issues with mpicleanup on Intel MPI 2019 U7.

1. The bin direcorty doesn't has mpicleanup script, since this is causing an error "mpirun: line 120: mpicleanup: command not found"

2. How mpicleanup is taken care in MPI 2019 U7?

I am not seeing any files created in /tmp/mpiexe<processid> file to track all the process id's.

Is there is new way of handling cleanup? 

0 Kudos
7 Replies
PrasanthD_intel
Moderator
1,266 Views

Hi,


Thanks for reaching out to us.

The I_MPI_HYDRA_CLEANUP creates a file if Hydra ends incorrectly and the processes are still running.

This feature is not supported in IMPI 2019 since hydra should cleanup all the processes itself automatically.

If you find the processes are not being cleaned up automatically, please let us know.


Regards

Prasanth


0 Kudos
tamilalagan
Novice
1,259 Views

Thank you Prakash,

I am getting the below error occasionally while launching the MPI.

[30] [proxy:0:18@<Node>] HYD_spawn (../../../../../src/pm/i_hydra/libhydra/spawn/intel/hydra_spawn.c:128): execvp error on file <MyProcess> (Too many open files).

Is this side effect of not cleaning up correctly?

 

0 Kudos
PrasanthD_intel
Moderator
1,225 Views

Hi,


You can use the top command to check if the processes are still running after they are finished/terminated.

And coming to the error it may be due to a limitation from the Linux/job scheduler.

Could you mention how many processes you were launching and your environment details (Job scheduler, interconnect, Provider) etc?

Please check if you have set any maximum number of processes limit in job scheduler.


Regards

Prasanth


0 Kudos
tamilalagan
Novice
1,216 Views

Hello Prasanth,

You can use the top command to check if the processes are still running after they are finished/terminated.

Tamil >> I will check and update, if issue is reproduced.

And coming to the error it may be due to a limitation from the Linux/job scheduler.

Could you mention how many processes you were launching and your environment details (Job scheduler, interconnect, Provider) etc?

Tamil >> We are launching only 2 process per node, Not using job scheduler,  Mellanox IB, OFI.

Please check if you have set any maximum number of processes limit in job scheduler.

Tamil >> We are not setting any max number of process limit.

0 Kudos
PrasanthD_intel
Moderator
1,165 Views

Hi,


Were you able to reproduce the issue and got a chance to check whether the threads have been still running?

Please confirm so we can go-ahead


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
1,130 Views

Hi,


Please let us know if you face the issue again. We were not able to reproduce the issue in our environment.


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
1,085 Views

Hi,

We are closing this thread considering your issue has been resolved. Please raise a new thread for any further assistance from Intel.

Any further interaction in this thread will be considered community only

Regards

Prasanth


0 Kudos
Reply