I have an MPI code that works fine on my windows machine vs2010. It has one master process that has MPI_COMM_ACCEPT ed a connection to another job that is running two MPI procs. This setup also works when I have the process running on my intel cluster node as long as it is only a one process job that has been accepted. But when I try two I get the message:
Internal Error: invalid error code 489e0e (Ring ids do not match) in MPIR_Barrier_impl:712 Fatal error in PMPI_Barrier: Other MPI error, error stack: PMPI_Barrier(949).....: MPI_Barrier(comm=0x84000000) failed MPIR_Barrier_impl(720): Failure during collective MPIR_Barrier_impl(712):
I note that there have some complaints of 'Ring ids do not match' for the latest mphich2 release.
Any help would be appreciated.
I am running Intel13 level of software. Is is a Fortran code
composer_xe_2013.1.117/ipp/lib/intel64:/opt/lic/intel13/composer_xe_2013.1.117/compiler/lib/intel64 rt version 13.0.1
I also have since tried using MPICH-3.0.2 with the same results.
Any ideas out there?
I don't think this will solve the problem, but try running both jobs with I_MPI_ADJUST_BARRIER=1. It is possible that the jobs are selecting different algorithms.
Do you have a reproducer you can share?
Technical Consulting Engineer
Intel® Cluster Tools