- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we encountered a nasty problem with the combination of OpenMP and MPI in one program:
The program uses OpenMP in several places and it can also employ MPI. If I let it run in
ordinary mode (so no MPI) then it works fine. If, however. I use mpirun/mpiexec to
let it run on 2 or more nodes in our Linux cluster, it aborts. The messages that are
written to standard error look like this:
INTERNAL ERROR: Invalid error class (66) encountered while returning from
MPI_Probe. Please file a bug report. No error stack is available.
Fatal error in MPI_Probe: Error message texts are not available[cli_0]: aborting job:
Fatal error in MPI_Probe: Error message texts are not available
The coredump reveals that something nasty happens in the OpenMP library:
Program terminated with signal 6, Aborted.
[New process 13187]
[New process 13186]
[New process 13182]
#0 0xffffe405 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe405 in __kernel_vsyscall ()
#1 0x003d9df0 in __gettextparse () from /lib/libc.so.6
#2 0x003db84e in modfl () from /lib/libc.so.6
#3 0x555f335c in __kmp_do_abort ()
from /opt/intel/Compiler/11.0/081/lib/ia32/libiomp5.so
#4 0x555ed4e1 in __kmp_wait_sleep ()
from /opt/intel/Compiler/11.0/081/lib/ia32/libiomp5.so
#5 0x555ecd5b in __kmp_barrier ()
from /opt/intel/Compiler/11.0/081/lib/ia32/libiomp5.so
#6 0x555dab95 in __kmpc_barrier ()
from /opt/intel/Compiler/11.0/081/lib/ia32/libiomp5.so
#7 0x0809af2d in __gxx_personality_v0 ()
#8 0x08420958 in __gxx_personality_v0 ()
#9 0x00000001 in ?? ()
#10 0x00000000 in ?? ()
We use MPICH2 (version 1.0.7), should that be relevant to the problem, and Intel Fortran 11.0.
Can anyone tell me what I can do about this (except for the obvious, of course: not use this combination)?
Regards,
Arjen
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Experts on this subject (at least for Intel MPI) are more likely to be found on the Intel forum for HPC and clusters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If Tim's suggestion doesn't yield results, the error dump looks like OpenMP barriers are involved. The blend of code, combined with libraries used, may have resulted in a thread or threads not part of the presumed team entering an OpenMP barrier. The cause of this can be due various reasons, one of which is:
(assumption) OpenMPI internally uses OpenMP to provide front-end and back-end threads. The OpenMPI thread teamuses a barrier while your OpenMP code establishes its own thread team that uses a barrier and the two thread teams end up using the same barrier. This is a conflict in the design usageof the barrier.
Can you adjust your code such that only the main thread makes OpenMPI calls, preferably from outside any OpenMP parallel regions?
An alternate approach is, try enabling OpenMP nested levels features. This may cause the OpenMP library to create seperate barriers (as opposed to a static barrier). It will have to do this if the same function with a parallel loopis called recursively as it could happen with nested levels.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I am aware, the OpenMP parts do not contain any MPI calls, but I may be wrong there.
I will first try Tim's suggestion and then see if there is anything obvious vis-a-vis MPI calls in these OpenMP-
parallellised regions.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
that did the trick. My computation runs fine now (well, fine with respect to the previous MPI error - the
results are less satisfactory, but that is a totally different issue).
Thanks, Tim, Jim.
(I never realised that OpenMP and MPIcould get entangled in this way)
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is good news.
If the amount of work per MPI message is relatively low and you suspect the messaging overhead is significant, it might be worth your while to setup an experiment where you measure the overhead in a "simple" test app. Run this test with and without MPI_THREAD_FUNNELED flag to see how much additional overhead is introduced. If it is insignificant then address your performance issues elsewhere. If you do notice a significant difference, then it may be worth yourwhile to re-do your code such that you can run without MPI_THREAD_FUNNELED. This may entail a master only calling strategy as I mentoned earlier or it may involve adding a non-OpenMP pthread for MPI messaging (assuming this does not exhibit same problem).
The other route to explore is making your own Barrier (assuming the barrier thing was the problem).
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page