- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, i am using intel mpirun (version = for Linux* OS, Version 4.0 Update 3 Build 20110824) to run a program that i have compiled in our cluster. We use PBS queue system (version = PBSPro_11.1.0.111761).
When I use
$ mpirun -n 8 -machinefile $PBS_NODEFILE -verbose /home/a.c.padilha/bin/vasp.teste.O0.debug.x
I end up getting these error messages:
[proxy:0:1@n022] got crush from 5, 0
[proxy:0:2@n023] got crush from 5, 0
[proxy:0:2@n023] got crush from 4, 0
[proxy:0:0@n009] got crush from 6, 0
[proxy:0:0@n009] got crush from 9, 0
[proxy:0:0@n009] got crush from 17, 0
[proxy:0:1@n022] got crush from 4, 0
[proxy:0:0@n009] got crush from 10, 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
I have tryed calling mpirun with -check_mpi and -env I_MPI_DEBUG=5 but so far i have no clue of what is going on. This happens only when i use more than one computing node.
Any help would be very nice.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Claudio,
Could you please provide the full output of your MPI- run with the “-genv I_MPI_HYDRA_DEBUG=1” environment. Also, please provide us the output of “cat $PBS_NODEFILE” - after resource allocation.
Regards,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm also experiencing the same error, but in my case it happens with only 1 node (I didn't try multiple nodes execution).
I use the following MPI version.
$ mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
Copyright (C) 2003-2011, Intel Corporation. All rights reserved.
I don't use queuing system, i.e. I execute my job from command line with the following:
$ mpirun -verbose -check-mpi -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -np 40 ~/bin/vasp5O0g > out 2>&1 &
Then the job ended with
[proxy:0:0@ebn13] got crush from 35, 0
[proxy:0:0@ebn13] got crush from 26, 0
snip
[proxy:0:0@ebn13] got crush from 41, 0
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
The executable is compiled with mpiifort associated with ifort version 12.1.2.273 Build 20111128 and is statically linked to MKL library.
The file including standard error/output is attached. If you need more information, please let me know.
Any kind of advice would be appreciated. Thank you.
Sincerely,
MM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Claudio,
I also had problems when trying to use more than one computing node with Intel MPI. These are my previous posts in case you can find some useful information:
http://software.intel.com/en-us/forums/topic/329053
http://software.intel.com/en-us/forums/topic/370967
Regards,
Ivan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
The output using
$ mpirun -n 16 -machinefile $PBS_NODEFILE -verbose -genv I_MPI_HYDRA_DEBUG=1 -check_mpi /home/a.c.padilha/bin/vasp.teste.O0.debug.x > log
is in the file log.txt. Even if i redirect my output to a file i got this message
ERROR: ld.so: object 'libVTmc.so' from LD_PRELOAD cannot be preloaded: ignored.
for each of the MPI processes. I looked for this libVTmc.so and found that it is a debugging library so i believe it is not related to the original problem in any manner.
Thanks for your reply Iván, but I could not get the same error message you got in your posts, even though I used exactly the same flags in the mpirun call.
Regards,
Claudio

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page