Community
cancel
Showing results for 
Search instead for 
Did you mean: 
thanhwru
Beginner
327 Views

MPI issues

Dear all,

i am now facing an error when i run RCMs model with the coupled of CLM, that error is as following:


[thanh:10120] *** An error occurred in MPI_Allgather
[thanh:10120] *** on communicator MPI_COMM_SELF
[thanh:10120] *** MPI_ERR_TYPE: invalid datatype
[thanh:10120] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 10120 on
node thanh exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[thanh:10118] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[thanh:10118] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

please, help me to solve this issue?

thanks in advance

kind regards,
Nguyen Tien Thanh
thanhwru83@gmail.com

0 Kudos
3 Replies
TimP
Black Belt
327 Views

This error can come from using mismatching MPI wrappers for parts of you compile. You would need to be certain that you compile everything with the same MPI version and link the MKL libraries for that MPI version and run also with that version. It looks like you are running under OpenMPI, which is one of the versions supported by MKL, but I don't know the specific range of OpenMPI.
thanhwru
Beginner
327 Views

Hi,

yes, now i am runing OpenMPI, so can you help me to solve this issue?

thanks a lot

thanhwru83@gmail.com
TimP
Black Belt
327 Views

OpenMPI is notoriously full of incompatibilities among major versions.
My MKL installation points to:
http://software.intel.com/en-us/articles/intel-mkl-110-system-requirements/
where it says the following was tested:
  • Open MPI 1.4.3 (http://www.open-mpi.org)
If you don't have the latest MPI, you should check whether yours refers to an earlier OpenMPI version.
So, for example, you should check whether use of an OpenMPI other than 1.4 would present such an incompatibility, as well as ensuring that you built everything against the same set of OpenMPI header files and that you have linked the MKL libraries for OpenMPI.
Major applications are poorly set up to ensure use of the correct headers. For example, the one I am testing now requires that you change the Makefile to use 'mpif90 -E' as the pre-processor, otherwise it will not get consistent headers for pre-processing.
If you use ifort, of course, you must have built and use your own copy of OpenMPI from source so that mpif90 and its libraries are built with your ifort.
A more recent version of OpenMPI 1.4 ought to be OK, but you might consider upgrading if you are using 1.3 or earlier.

If you can show such a problem in an example where you use a single version of OpenMPI consistent with your MKL libraries, you would have an apparent bug which you should submit on premier.intel.com or show your reproducer on this forum.