Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Invalid communicator issue with PMPI_Allreduce

Mike_D_6
Beginner
3,276 Views

Operating system and version: CentOS Linux release 7.5.1804
Intel MPI version: 2019.5.281
Compiler and version: 19.0.5.281
Fabric: Mellanox Technologies MT27500
Libfabric version: 1.7.2

Would anyone be able to help me with an "invalid communicator" error I've been getting with Intel MPI plus Intel compilers (not present with OpenMPI plus GNU or Intel compilers) in one subroutine in a large code?

I receive the error when I use MPI_ALLREDUCE in this subroutine, but if I replace it with an MPI_REDUCE followed by an MPI_BCAST the code works fine. There are many other instances of MPI_ALLREDUCE in other subroutines that seem to work fine. The snippet that works:

     CALL MPI_REDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_REDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_REDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEX,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEY,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEZ,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)

The snippet that causes the error:

     CALL MPI_ALLREDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_ALLREDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_ALLREDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)

After setting I_MPI_DEBUG=6, I_MPI_HYDRA_DEBUG=on, the error message is:

Abort(1007228933) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Allreduce: Invalid communicator, error stack:
PMPI_Allreduce(434): MPI_Allreduce(sbuf=0x2b5015d1b6c0, rbuf=0x2b5004ff8740, count=1536, datatype=dtype=0x4c000829, op=MPI_SUM, comm=comm=0x0) failed
PMPI_Allreduce(355): Invalid communicator

The problem persists while using only a single core with mpirun. The initial MPI debug output then is:

$ mpirun -ppn 1 -n 1 ../build/cmake/charmm-bug -i c45test/dcm-ti.inp

[mpiexec@pc-beethoven.cluster] Launch arguments: /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host pc-beethoven.cluster --upstream-port 36326 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=appnum appnum=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[0] MPI startup(): libfabric version: 1.7.2a-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=put kvsname=kvs_24913_0 key=bc-0 value=mpi#0200ADFEC0A864030000000000000000$
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get kvsname=kvs_24913_0 key=bc-0
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200ADFEC0A864030000000000000000$
[0] MPI startup(): Rank    Pid      Node name             Pin cpu
[0] MPI startup(): 0       24917    pc-beethoven.cluster  {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_DEBUG=on
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=6

Note that I also tried using FI_PROVIDER=sockets, with the same result. Any ideas?

0 Kudos
1 Solution
PrasanthD_intel
Moderator
3,276 Views
Hi Mark, We have tried to reproduce your issue and found an error in the MPI_ALLREDUCE subroutine. Below is the correct call : CALL MPI_ALLREDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM, &COMM_CHARMM,IERROR) There is no need to provide root in MPI_ALLREDUCE. Hope this resolves your issue. Let us know in-case if the problem still persists. Thanks Prasanth

View solution in original post

0 Kudos
3 Replies
PrasanthD_intel
Moderator
3,277 Views
Hi Mark, We have tried to reproduce your issue and found an error in the MPI_ALLREDUCE subroutine. Below is the correct call : CALL MPI_ALLREDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM, &COMM_CHARMM,IERROR) There is no need to provide root in MPI_ALLREDUCE. Hope this resolves your issue. Let us know in-case if the problem still persists. Thanks Prasanth
0 Kudos
Mike_D_6
Beginner
3,276 Views

Hi Prasanth,

Thanks! That was a stupid syntax error on my part. Strange thing is that the code runs anyway with OpenMPI, otherwise I'd have spotted the mistake a lot sooner.

Anyway, thanks again for pointing it out!

Best,
Mike

0 Kudos
PrasanthD_intel
Moderator
3,276 Views

Thank you, Mike
Glad to hear that the solution provided helps. We are closing this thread as the issue got resolved. Feel free to raise a new thread in case of any further issues
Have a good day!
Regards,
Prasanth

 

0 Kudos
Reply