Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
147 Views

libmkl_blacs_openmpi compatibility

Hello everybody,

I have a question regarding the libmkl_blacs_openmpi* libraries. Which Openmpi version is this library supposed to be compatible with ?

I could not find this information in the usual MKL or compiler release notes. By testing I determined that the libmkl_blacs_openmpi_lp64.so  from the MKL which is bundled with intel 2016 update 4 is compatible with openmpi 2.0, i.e. programs using the libmkl_scalapack_lp64.so work and apparently give correct results. However, using the  libraries from the intel 2017 update 2 distribution together with openmpi 2.0 and 2.1 gives programs producing a runtime error as soon as BLACS routines are called. I had no time to test intel 2017 update 4 yet, but an authoritative answer on the compatibility would be helpful even if it should work with update 4.

Of course if this is documented in detail somewhere a pointer to the documentation is appreciated, too.

Best Regards

Christof

 

 

 

 

0 Kudos
5 Replies
Highlighted
Moderator
147 Views

Christof, please have a look at the MKL's System Requirements page - https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-system-requirements. We added there some info about version of OpenMPI which MKL has been validated against. 

 

0 Kudos
Highlighted
147 Views

Hello Gennady,

thank you very much for the information. So it was luck that it works (or appears to work :-) with OpenMPI 2.0 without a hitch, 1.10 for example certainly does not work. This helps me quite a lot to decide what additional libraries to make available and which MPIs to combine with which compiler.

I might add a question: Are there any plans to upgrade the supported OpenMPI version at some point ? As you can see on the OpenMPI web page the 1.8 branch is distant history and unsupported. A modern interconnect (Intel Omni-Path comes to mind) will IMHO not work with 1.8 for example.

Cheers

Christof

 

 

 

0 Kudos
Highlighted
Moderator
147 Views

hi Christof, 

We don’t have such plans with the nearest MKL version 2018 at least. We also don’t expect to see problem with OpenMPI v.2.0.

In the case if someone may met some problem in such cases then:  1/ please report us the case 2/ try to use MPI wrappers – see more details follow this link: https://software.intel.com/en-us/articles/using-intel-mkl-mpi-wrapper-with-the-intel-mkl-cluster-fun...

thanks, Gennady

0 Kudos
Highlighted
147 Views

Hello Gennady,

I might try using the MPI wrapper method at some point, as far as I can see the OS should not make any difference. The webpage leaves me a bit confused, though. I just replace intel_openmpi_blacs.a with the new one and inlcude libmkl_scalapack_lp64.a as before, right ?  I will not have to deal with the BLACS reference implementation in the official reference scalapack source from netlib, the wrapper is independent of that ?

With respect to my problems with intel 2017 update 2. I see aborts/segfaults/hangs when linking with openmpi 2.1 on an ubuntu 14.04 workstation (E3 Xeon), no network involved. I have seen similar problems on Centos 7. With intel 2016 the same programs appear to run without problems in the same environment.

In particular, the programs are vasp (www.vasp.at), cp2k (www.cp2k.org) and the mpi-parallel version of our in-house code dftb+ (http://www.dftb-plus.info/). I will concentrate on dftb+. Our developer reported the following problem when running with more than one core (mpi-rank). The MPI error message is below and the call-trace is attached as a seperate file. If there is any interest to look into "out-of-the box" openmpi compatibility for versions other than 1.8 from you side we would try to make the effort to develop a simple reproducer. As already mentioned, with intel 2016 there appear to be no problems, same program in the same environment with the same input works fine.

dftb+ error message:

[core326:22337] *** An error occurred in MPI_Bcast
[core326:22337] *** reported by process [3466788865,0]
[core326:22337] *** on communicator MPI COMMUNICATOR 5 SPLIT FROM 3
[core326:22337] *** MPI_ERR_TRUNCATE: message truncated
[core326:22337] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core326:22337] ***    and potentially your MPI job)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node core326 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

With vasp 5.3.5 (which we only use but not develop) I get a very similar message, this time on workstation running ubuntu 16.04 with intel 2017 update 4:

[core403:21200] *** An error occurred in MPI_Bcast
[core403:21200] *** reported by process [140736888832001,140733193388034]
[core403:21200] *** on communicator MPI COMMUNICATOR 11 SPLIT FROM 9
[core403:21200] *** MPI_ERR_TRUNCATE: message truncated
[core403:21200] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core403:21200] ***    and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
vasp-mpi           0000000005EAB28A  for__signal_handl     Unknown  Unknown
libpthread-2.23.s  00007FFFF798C390  Unknown               Unknown  Unknown
libopen-pal.so.20  00007FFFF6062E03  Unknown               Unknown  Unknown
libopen-pal.so.20  00007FFFF6004198  opal_progress         Unknown  Unknown
libmpi.so.20.10.1  00007FFFF6B3A4AC  ompi_request_defa     Unknown  Unknown

libmpi.so.20.10.1  00007FFFF6B70555  ompi_coll_base_bc     Unknown  Unknown
libmpi.so.20.10.1  00007FFFF6B70B08  ompi_coll_base_bc     Unknown  Unknown
mca_coll_tuned.so  00007FFFEA9ACF9F  ompi_coll_tuned_b     Unknown  Unknown
libmpi.so.20.10.1  00007FFFF6B489D5  MPI_Bcast             Unknown  Unknown
libmkl_blacs_open  00007FFFF7BBEA9D  MKLMPI_Bcast          Unknown  Unknown
libmkl_blacs_open  00007FFFF7BB0806  Czgebs2d              Unknown  Unknown
vasp-mpi           00000000011F32C9  Unknown               Unknown  Unknown
vasp-mpi           000000000120C420  Unknown               Unknown  Unknown
vasp-mpi           00000000011CE898  Unknown               Unknown  Unknown
vasp-mpi           00000000011A366F  Unknown               Unknown  Unknown
vasp-mpi           00000000011A1888  Unknown               Unknown  Unknown
vasp-mpi           00000000004809F2  Unknown               Unknown  Unknown
vasp-mpi           0000000000B59456  Unknown               Unknown  Unknown
vasp-mpi           0000000000BC96C1  Unknown               Unknown  Unknown
vasp-mpi           00000000004337CB  Unknown               Unknown  Unknown
vasp-mpi           000000000040DD2E  Unknown               Unknown  Unknown
libc-2.23.so       00007FFFF6753830  __libc_start_main     Unknown  Unknown
vasp-mpi           000000000040DC29  Unknown               Unknown  Unknown

and another one (same input, new run)

[core403:21471] *** An error occurred in MPI_Bcast
[core403:21471] *** reported by process [46913346142209,46909632806914]
[core403:21471] *** on communicator MPI COMMUNICATOR 11 SPLIT FROM 9
[core403:21471] *** MPI_ERR_TRUNCATE: message truncated
[core403:21471] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[core403:21471] ***    and potentially your MPI job)
forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
vasp-mpi           0000000005EAB28A  for__signal_handl     Unknown  Unknown
libpthread-2.23.s  00002AAAAAF23390  Unknown               Unknown  Unknown
libopen-pal.so.20  00002AAAAC891E03  Unknown               Unknown  Unknown
libopen-pal.so.20  00002AAAAC833198  opal_progress         Unknown  Unknown
mca_pml_ob1.so     00002AAAB6E7011E  Unknown               Unknown  Unknown
mca_pml_ob1.so     00002AAAB6E6FF3E  mca_pml_ob1_recv      Unknown  Unknown
libmpi.so.20.10.1  00002AAAABD34611  MPI_Recv              Unknown  Unknown
libmkl_blacs_open  00002AAAAACFB29B  MKLMPI_Recv           Unknown  Unknown
libmkl_blacs_open  00002AAAAACFEDA9  BI_Srecv              Unknown  Unknown
libmkl_blacs_open  00002AAAAACEA56B  Czgerv2d              Unknown  Unknown
vasp-mpi           00000000011FD4F1  Unknown               Unknown  Unknown

 

Again, on this machine with intel 2016 update 2 (for compatibility reasons, apparently update 4 does not produce working binaries https://software.intel.com/en-us/forums/intel-c-compiler/topic/637950) everything works.

Best Regards

Christof

 

 

 

0 Kudos
Highlighted
147 Views

Hello again,

I should add that there are no problems when linking against a reference Scalapack, so we assume that our MPIs are OK in any case.

Best Regards

Christof

 

0 Kudos