Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

PMPI_Bcast: Message truncated,

AndrewC
New Contributor III
3,732 Views

Hi,

I am trying to debug some problems with getting an exe developed by another group in our company to run on Intel MPI. I am using Linux version 4.1.

Debug  output as below....

Does the error indicate a "programming error" on their part ( buffers not sized correctly?) or some other issue.

Thanks

[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 2  Build 20131023
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation.  All rights reserved.
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8

 

[0] MPI startup(): Rank    Pid      Node name     Pin cpu
[0] MPI startup(): 0       30601    linuxdev      {0,1,2,3}
[0] MPI startup(): 1       15240    centosserver  {0,1,2,3}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
[0] MPI startup(): Topology split mode = 1

[1] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
| rank | node | space=2
|  0  |  0  |
|  1  |  1  |
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_FABRICS=shm:tcp
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R)
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,16
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,6291456
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3
[0] MPI startup(): I_MPI_INFO_C_NAME=Wolfdale
[0] MPI startup(): I_MPI_INFO_DESC=1342208505
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=398124031
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_LCPU=4
[0] MPI startup(): I_MPI_INFO_MODE=263
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0
[0] MPI startup(): I_MPI_INFO_SERIAL=E31225
[0] MPI startup(): I_MPI_INFO_SIGN=132775
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0

.....

 

                     
Fatal error in PMPI_Bcast: Message truncated, error stack:
PMPI_Bcast(2112)......................: MPI_Bcast(buf=0x2ae6d2ef9010, count=1, dtype=USER<vector>, root=0, comm=0x84000000) failed
MPIR_Bcast_impl(1670).................:
I_MPIR_Bcast_intra(1887)..............: Failure during collective
MPIR_Bcast_intra(1524)................: Failure during collective
MPIR_Bcast_intra(1510)................:
MPIR_Bcast_scatter_ring_allgather(841):
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
MPIR_Bcast_scatter_ring_allgather(789):
scatter_for_bcast(301)................:
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
rank = 1, revents = 8, state = 8

0 Kudos
3 Replies
James_T_Intel
Moderator
3,732 Views

That error indicates that the MPI_Bcast call is trying to send too large of a message.  Keep the message under 2 GB and it should work.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
AndrewC
New Contributor III
3,732 Views

Excellent, thanks for  the tip!

0 Kudos
Sanjiv_T_
Beginner
3,732 Views

Hi ,

I have compiled espresso with intel mpi and MKL library but  getting error Failure during collective error when ever it is working fine with openmpi.

is there problem with intel mpi


Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x516f460, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x5300310, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x6b295c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x67183d0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x4f794c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
[0:n125] unexpected disconnect completion event from [22:n122]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 0
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x56bfe30, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
/var/spool/PBS/mom_priv/epilogue: line 30: kill: (5089) - No such process


Kindly help us for resolving this


Thanks
sanjiv

0 Kudos
Reply