- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody,
I have a question regarding the libmkl_blacs_openmpi* libraries. Which Openmpi version is this library supposed to be compatible with ?
I could not find this information in the usual MKL or compiler release notes. By testing I determined that the libmkl_blacs_openmpi_lp64.so from the MKL which is bundled with intel 2016 update 4 is compatible with openmpi 2.0, i.e. programs using the libmkl_scalapack_lp64.so work and apparently give correct results. However, using the libraries from the intel 2017 update 2 distribution together with openmpi 2.0 and 2.1 gives programs producing a runtime error as soon as BLACS routines are called. I had no time to test intel 2017 update 4 yet, but an authoritative answer on the compatibility would be helpful even if it should work with update 4.
Of course if this is documented in detail somewhere a pointer to the documentation is appreciated, too.
Best Regards
Christof
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Christof, please have a look at the MKL's System Requirements page - https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-system-requirements. We added there some info about version of OpenMPI which MKL has been validated against.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady,
thank you very much for the information. So it was luck that it works (or appears to work :-) with OpenMPI 2.0 without a hitch, 1.10 for example certainly does not work. This helps me quite a lot to decide what additional libraries to make available and which MPIs to combine with which compiler.
I might add a question: Are there any plans to upgrade the supported OpenMPI version at some point ? As you can see on the OpenMPI web page the 1.8 branch is distant history and unsupported. A modern interconnect (Intel Omni-Path comes to mind) will IMHO not work with 1.8 for example.
Cheers
Christof
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi Christof,
We don’t have such plans with the nearest MKL version 2018 at least. We also don’t expect to see problem with OpenMPI v.2.0.
In the case if someone may met some problem in such cases then: 1/ please report us the case 2/ try to use MPI wrappers – see more details follow this link: https://software.intel.com/en-us/articles/using-intel-mkl-mpi-wrapper-with-the-intel-mkl-cluster-functions
thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady,
I might try using the MPI wrapper method at some point, as far as I can see the OS should not make any difference. The webpage leaves me a bit confused, though. I just replace intel_openmpi_blacs.a with the new one and inlcude libmkl_scalapack_lp64.a as before, right ? I will not have to deal with the BLACS reference implementation in the official reference scalapack source from netlib, the wrapper is independent of that ?
With respect to my problems with intel 2017 update 2. I see aborts/segfaults/hangs when linking with openmpi 2.1 on an ubuntu 14.04 workstation (E3 Xeon), no network involved. I have seen similar problems on Centos 7. With intel 2016 the same programs appear to run without problems in the same environment.
In particular, the programs are vasp (www.vasp.at), cp2k (www.cp2k.org) and the mpi-parallel version of our in-house code dftb+ (http://www.dftb-plus.info/). I will concentrate on dftb+. Our developer reported the following problem when running with more than one core (mpi-rank). The MPI error message is below and the call-trace is attached as a seperate file. If there is any interest to look into "out-of-the box" openmpi compatibility for versions other than 1.8 from you side we would try to make the effort to develop a simple reproducer. As already mentioned, with intel 2016 there appear to be no problems, same program in the same environment with the same input works fine.
dftb+ error message:
[core326:22337] *** An error occurred in MPI_Bcast
[core326:22337] *** reported by process [3466788865,0]
[core326:22337] *** on communicator MPI COMMUNICATOR 5 SPLIT FROM 3
[core326:22337] *** MPI_ERR_TRUNCATE: message truncated
[core326:22337] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core326:22337] *** and potentially your MPI job)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node core326 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
With vasp 5.3.5 (which we only use but not develop) I get a very similar message, this time on workstation running ubuntu 16.04 with intel 2017 update 4:
[core403:21200] *** An error occurred in MPI_Bcast
[core403:21200] *** reported by process [140736888832001,140733193388034]
[core403:21200] *** on communicator MPI COMMUNICATOR 11 SPLIT FROM 9
[core403:21200] *** MPI_ERR_TRUNCATE: message truncated
[core403:21200] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core403:21200] *** and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp-mpi 0000000005EAB28A for__signal_handl Unknown Unknown
libpthread-2.23.s 00007FFFF798C390 Unknown Unknown Unknown
libopen-pal.so.20 00007FFFF6062E03 Unknown Unknown Unknown
libopen-pal.so.20 00007FFFF6004198 opal_progress Unknown Unknown
libmpi.so.20.10.1 00007FFFF6B3A4AC ompi_request_defa Unknown Unknown
libmpi.so.20.10.1 00007FFFF6B70555 ompi_coll_base_bc Unknown Unknown
libmpi.so.20.10.1 00007FFFF6B70B08 ompi_coll_base_bc Unknown Unknown
mca_coll_tuned.so 00007FFFEA9ACF9F ompi_coll_tuned_b Unknown Unknown
libmpi.so.20.10.1 00007FFFF6B489D5 MPI_Bcast Unknown Unknown
libmkl_blacs_open 00007FFFF7BBEA9D MKLMPI_Bcast Unknown Unknown
libmkl_blacs_open 00007FFFF7BB0806 Czgebs2d Unknown Unknown
vasp-mpi 00000000011F32C9 Unknown Unknown Unknown
vasp-mpi 000000000120C420 Unknown Unknown Unknown
vasp-mpi 00000000011CE898 Unknown Unknown Unknown
vasp-mpi 00000000011A366F Unknown Unknown Unknown
vasp-mpi 00000000011A1888 Unknown Unknown Unknown
vasp-mpi 00000000004809F2 Unknown Unknown Unknown
vasp-mpi 0000000000B59456 Unknown Unknown Unknown
vasp-mpi 0000000000BC96C1 Unknown Unknown Unknown
vasp-mpi 00000000004337CB Unknown Unknown Unknown
vasp-mpi 000000000040DD2E Unknown Unknown Unknown
libc-2.23.so 00007FFFF6753830 __libc_start_main Unknown Unknown
vasp-mpi 000000000040DC29 Unknown Unknown Unknown
and another one (same input, new run)
[core403:21471] *** An error occurred in MPI_Bcast
[core403:21471] *** reported by process [46913346142209,46909632806914]
[core403:21471] *** on communicator MPI COMMUNICATOR 11 SPLIT FROM 9
[core403:21471] *** MPI_ERR_TRUNCATE: message truncated
[core403:21471] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core403:21471] *** and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp-mpi 0000000005EAB28A for__signal_handl Unknown Unknown
libpthread-2.23.s 00002AAAAAF23390 Unknown Unknown Unknown
libopen-pal.so.20 00002AAAAC891E03 Unknown Unknown Unknown
libopen-pal.so.20 00002AAAAC833198 opal_progress Unknown Unknown
mca_pml_ob1.so 00002AAAB6E7011E Unknown Unknown Unknown
mca_pml_ob1.so 00002AAAB6E6FF3E mca_pml_ob1_recv Unknown Unknown
libmpi.so.20.10.1 00002AAAABD34611 MPI_Recv Unknown Unknown
libmkl_blacs_open 00002AAAAACFB29B MKLMPI_Recv Unknown Unknown
libmkl_blacs_open 00002AAAAACFEDA9 BI_Srecv Unknown Unknown
libmkl_blacs_open 00002AAAAACEA56B Czgerv2d Unknown Unknown
vasp-mpi 00000000011FD4F1 Unknown Unknown Unknown
Again, on this machine with intel 2016 update 2 (for compatibility reasons, apparently update 4 does not produce working binaries https://software.intel.com/en-us/forums/intel-c-compiler/topic/637950) everything works.
Best Regards
Christof
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
I should add that there are no problems when linking against a reference Scalapack, so we assume that our MPIs are OK in any case.
Best Regards
Christof
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page