- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Operating system and version: CentOS Linux release 7.5.1804
Intel MPI version: 2019.5.281
Compiler and version: 19.0.5.281
Fabric: Mellanox Technologies MT27500
Libfabric version: 1.7.2
Would anyone be able to help me with an "invalid communicator" error I've been getting with Intel MPI plus Intel compilers (not present with OpenMPI plus GNU or Intel compilers) in one subroutine in a large code?
I receive the error when I use MPI_ALLREDUCE in this subroutine, but if I replace it with an MPI_REDUCE followed by an MPI_BCAST the code works fine. There are many other instances of MPI_ALLREDUCE in other subroutines that seem to work fine. The snippet that works:
CALL MPI_REDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR) CALL MPI_REDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR) CALL MPI_REDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR) CALL MPI_BCAST(FEX,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR) CALL MPI_BCAST(FEY,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR) CALL MPI_BCAST(FEZ,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)
The snippet that causes the error:
CALL MPI_ALLREDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR) CALL MPI_ALLREDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR) CALL MPI_ALLREDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, & COMM_CHARMM,IERROR)
After setting I_MPI_DEBUG=6, I_MPI_HYDRA_DEBUG=on, the error message is:
Abort(1007228933) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Allreduce: Invalid communicator, error stack:
PMPI_Allreduce(434): MPI_Allreduce(sbuf=0x2b5015d1b6c0, rbuf=0x2b5004ff8740, count=1536, datatype=dtype=0x4c000829, op=MPI_SUM, comm=comm=0x0) failed
PMPI_Allreduce(355): Invalid communicator
The problem persists while using only a single core with mpirun. The initial MPI debug output then is:
$ mpirun -ppn 1 -n 1 ../build/cmake/charmm-bug -i c45test/dcm-ti.inp
[mpiexec@pc-beethoven.cluster] Launch arguments: /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host pc-beethoven.cluster --upstream-port 36326 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=appnum appnum=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[0] MPI startup(): libfabric version: 1.7.2a-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=put kvsname=kvs_24913_0 key=bc-0 value=mpi#0200ADFEC0A864030000000000000000$
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get kvsname=kvs_24913_0 key=bc-0
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200ADFEC0A864030000000000000000$
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 24917 pc-beethoven.cluster {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_DEBUG=on
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=6
Note that I also tried using FI_PROVIDER=sockets, with the same result. Any ideas?
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prasanth,
Thanks! That was a stupid syntax error on my part. Strange thing is that the code runs anyway with OpenMPI, otherwise I'd have spotted the mistake a lot sooner.
Anyway, thanks again for pointing it out!
Best,
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Mike
Glad to hear that the solution provided helps. We are closing this thread as the issue got resolved. Feel free to raise a new thread in case of any further issues
Have a good day!
Regards,
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page