- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to debug some problems with getting an exe developed by another group in our company to run on Intel MPI. I am using Linux version 4.1.
Debug output as below....
Does the error indicate a "programming error" on their part ( buffers not sized correctly?) or some other issue.
Thanks
[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 2 Build 20131023
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation. All rights reserved.
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[1] MPI startup(): Recognition mode: 2, selected platform: 8 own platform: 8
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 30601 linuxdev {0,1,2,3}
[0] MPI startup(): 1 15240 centosserver {0,1,2,3}
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
[0] MPI startup(): Topology split mode = 1
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=0 dev=5) Fabric(intra=1 inter=6 flags=0x0)
| rank | node | space=2
| 0 | 0 |
| 1 | 1 |
[0] MPI startup(): I_MPI_DEBUG=100
[0] MPI startup(): I_MPI_FABRICS=shm:tcp
[0] MPI startup(): I_MPI_INFO_BRAND=Intel(R) Xeon(R)
[0] MPI startup(): I_MPI_INFO_CACHE1=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE2=0,1,2,3
[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,16
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,262144,6291456
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3
[0] MPI startup(): I_MPI_INFO_C_NAME=Wolfdale
[0] MPI startup(): I_MPI_INFO_DESC=1342208505
[0] MPI startup(): I_MPI_INFO_FLGB=0
[0] MPI startup(): I_MPI_INFO_FLGC=398124031
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_LCPU=4
[0] MPI startup(): I_MPI_INFO_MODE=263
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0
[0] MPI startup(): I_MPI_INFO_SERIAL=E31225
[0] MPI startup(): I_MPI_INFO_SIGN=132775
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0
.....
Fatal error in PMPI_Bcast: Message truncated, error stack:
PMPI_Bcast(2112)......................: MPI_Bcast(buf=0x2ae6d2ef9010, count=1, dtype=USER<vector>, root=0, comm=0x84000000) failed
MPIR_Bcast_impl(1670).................:
I_MPIR_Bcast_intra(1887)..............: Failure during collective
MPIR_Bcast_intra(1524)................: Failure during collective
MPIR_Bcast_intra(1510)................:
MPIR_Bcast_scatter_ring_allgather(841):
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
MPIR_Bcast_scatter_ring_allgather(789):
scatter_for_bcast(301)................:
MPIDI_CH3U_Receive_data_found(129)....: Message from rank 0 and tag 2 truncated; 50000000 bytes received but buffer size is 20000000
rank = 1, revents = 8, state = 8
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That error indicates that the MPI_Bcast call is trying to send too large of a message. Keep the message under 2 GB and it should work.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Excellent, thanks for the tip!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I have compiled espresso with intel mpi and MKL library but getting error Failure during collective error when ever it is working fine with openmpi.
is there problem with intel mpi
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x516f460, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x5300310, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x6b295c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x67183d0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x4f794c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
[0:n125] unexpected disconnect completion event from [22:n122]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 0
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x56bfe30, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
/var/spool/PBS/mom_priv/epilogue: line 30: kill: (5089) - No such process
Kindly help us for resolving this
Thanks
sanjiv
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page