Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

mpirun error "APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"

Antonio_Claudio_P_
5,642 Views

Hello, i am using intel mpirun (version = for Linux* OS, Version 4.0 Update 3 Build 20110824) to run a program that i have compiled in our cluster. We use PBS queue system (version = PBSPro_11.1.0.111761).

When I use

$ mpirun -n 8 -machinefile $PBS_NODEFILE -verbose /home/a.c.padilha/bin/vasp.teste.O0.debug.x 

I end up getting these error messages:

[proxy:0:1@n022] got crush from 5, 0
[proxy:0:2@n023] got crush from 5, 0
[proxy:0:2@n023] got crush from 4, 0
[proxy:0:0@n009] got crush from 6, 0
[proxy:0:0@n009] got crush from 9, 0
[proxy:0:0@n009] got crush from 17, 0
[proxy:0:1@n022] got crush from 4, 0
[proxy:0:0@n009] got crush from 10, 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

I have tryed calling mpirun with -check_mpi and -env I_MPI_DEBUG=5 but so far i have no clue of what is going on. This happens only when i use more than one computing node.

Any help would be very nice.

0 Kudos
4 Replies
Michael_Intel
Moderator
5,642 Views

Hi Claudio,

Could you please provide the full output of your MPI- run with the “-genv I_MPI_HYDRA_DEBUG=1” environment. Also, please provide us the output of “cat $PBS_NODEFILE” - after resource allocation.

Regards,

Michael

0 Kudos
mat1
Beginner
5,642 Views

Hi,

I'm also experiencing the same error, but in my case it happens with only 1 node (I didn't try multiple nodes execution).

I use the following MPI version.

$ mpirun -V

Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824

Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

I don't use queuing system, i.e. I execute my job from command line with the following:

$ mpirun -verbose -check-mpi -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -np 40 ~/bin/vasp5O0g > out 2>&1 &

Then the job ended with

[proxy:0:0@ebn13] got crush from 35, 0

[proxy:0:0@ebn13] got crush from 26, 0

snip

[proxy:0:0@ebn13] got crush from 41, 0

APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

The executable is compiled with mpiifort associated with ifort version 12.1.2.273 Build 20111128 and is statically linked to MKL library.

The file including standard error/output is attached. If you need more information, please let me know.

Any kind of advice would be appreciated. Thank you.

Sincerely,

MM

0 Kudos
Santos__Iván
Beginner
5,642 Views

Dear Claudio,

I also had problems when trying to use more than one computing node with Intel MPI. These are my previous posts in case you can find some useful information:

http://software.intel.com/en-us/forums/topic/329053

http://software.intel.com/en-us/forums/topic/370967

Regards,

Ivan

0 Kudos
Antonio_Claudio_P_
5,642 Views

Hi Michael,

 The output using

$ mpirun -n 16 -machinefile $PBS_NODEFILE -verbose -genv I_MPI_HYDRA_DEBUG=1 -check_mpi /home/a.c.padilha/bin/vasp.teste.O0.debug.x > log

 is in the file log.txt. Even if i redirect my output to a file i got this message 

ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.

for each of the MPI processes. I looked for this libVTmc.so and found that it is a debugging library so i believe it is not related to the original problem in any manner.

Thanks for your reply Iván, but I could not get the same error message you got in your posts, even though I used exactly the same flags in the mpirun call.

Regards,

Claudio

0 Kudos
Reply