Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Nicolay
Beginner
122 Views

Intel MPI

Hello.

I am developing fault tolerant communicaction layer (MPI-like). So  this layer is slow now. 

Can I learn something about internal Intel MPI architecture to understand basic principles of big data transfering over different networks.

Can I also contact to developers of Intel MPI library to share experiences.

Nicolay.

0 Kudos
2 Replies
Gergana_S_Intel
Employee
122 Views

Hi Nicolay,

Since our implementation is proprietary, I doubt we'll be able to share any knowledge with you.  But I'll forward your request to the developers.

One thing I can recommend is taking a look at the MPICH source base since Intel MPI is based on that.  Furthermore, we work and closely support the Open Fabrics Alliance and their networking stack which is also open source.

Hope this helps.

Regards,
~Gergana

Sanjiv_T_
Beginner
122 Views

Hi ,

I have compiled espresso with intel mpi and MKL library but  getting error Failure during collective error when ever it is working fine with openmpi.

is there problem with intel mpi

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x516f460, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x5300310, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x6b295c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x67183d0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x4f794c0, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
[0:n125] unexpected disconnect completion event from [22:n122]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 0
Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2112)........: MPI_Bcast(buf=0x56bfe30, count=96, MPI_DOUBLE_PRECISION, root=4, comm=0x84000004) failed
MPIR_Bcast_impl(1670)...:
I_MPIR_Bcast_intra(1887): Failure during collective
MPIR_Bcast_intra(1524)..: Failure during collective
/var/spool/PBS/mom_priv/epilogue: line 30: kill: (5089) - No such process

Kindly help us for resolving this

Thanks
sanjiv

Reply