- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have several new IBM iDataplexes. Some of our codes compiled with Intel 12.1 with INTEL-MPI-4.0.3 would sometimes fail with this error:
"APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)"
I can consistenly replicate this error with the Intel IMB-MPI1.4.0.3 benchmark system on two nodes (32 cores).
The error above happens in the Allgatherv benchmark using 32 processes after te 8192 byte size messages (see below).
*BUT*, if I were to JUST RUN an Allgatherv benchmark, it works with no problems. It appears a previous MPI funciton call is setting the system in some state to cause Allfatherv to fail.
#----------------------------------------------------------------
# Benchmarking Allgatherv
# #processes = 32
#----------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.18 0.19 0.18
1 1000 55.13 55.16 55.14
2 1000 55.58 55.60 55.59
4 1000 55.42 55.44 55.43
8 1000 54.99 55.02 55.01
16 1000 56.93 56.95 56.94
32 1000 60.37 60.37 60.37
64 1000 60.45 60.45 60.45
128 1000 59.13 59.14 59.13
256 1000 152.55 152.59 152.57
512 1000 152.85 152.90 152.88
1024 1000 92.38 92.39 92.39
2048 1000 198.94 199.08 198.98
4096 1000 244.89 245.09 244.97
8192 1000 323.58 323.74 323.70
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi jianni,
Thanks for posting. Unfortunately, this is not enough information to determine the cause of the failure. It's a pretty generic error. Can you set I_MPI_DEBUG=5 and send us the output? That might provide more information on what the Intel MPI Library is doing before it hits the error.
Based on your description below, it seems like this happens when running 32 MPI ranks. Does it happen for any job over 32 ranks or specifically 32? How about below 32 ranks?
It'll be great to know your command line, as well as if you're setting any Intel MPI-specific environment variables. And, if you're running over an InfiniBand network or just tcp (the I_MPI_DEBUG output will give us some of this data). If you are running over IB, it'll be intersting to see if using regular Ethernet imporves the situations (you can do that by setting I_MPI_FABRICS=shm:tcp).
Looking forward to hearing back soon.
Regards,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page