Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1917 Discussions

Internal MPI error on user but not as root

Luis_Diego_C_
Beginner
632 Views

I'm trying to run a fortran code, but when I run it I'm getting this message:

Fatal error in PMPI_Waitall: Other MPI error, error stack:
PMPI_Waitall(405)...............: MPI_Waitall(count=5, req_array=0xb0b8c8, status_array=0xb14e68) failed
MPIR_Waitall_impl(221)..........: fail failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

 

If i run it in root, it works.

I try updating my parallel studio xe from 2017 to 2017.1, but didn't make any difference.

Any news/help with this issue?

0 Kudos
5 Replies
James_T_Intel
Moderator
632 Views

Check ulimits -a for both root and a normal user.  These should be the same.

Victor_G_1
Beginner
632 Views

Hi

I have got the same error:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

Does Intel have MPI version with a solution?

Victor

Victor_G_1
Beginner
631 Views

Hi

I have the same error. Does Intel have MPI version without the error?

Error log:

Fatal error in PMPI_Gatherv: Other MPI error, error stack:
PMPI_Gatherv(1001)..............: MPI_Gatherv failed(sbuf=0x256d2c0, scount=9210, MPI_DOUBLE, rbuf=0x258c560, rcnts=0x2559f00, displs=0x2559f20, MPI_DOUBLE, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(545)..........: fail failed
I_MPIR_Gatherv_intra(611).......: fail failed
MPIR_Gatherv(422)...............: fail failed
MPIC_Irecv(857).................: fail failed
MPID_Irecv(160).................: fail failed
MPID_nem_lmt_RndvRecv(208)......: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error!  Cannot read from remote process

 

McCalpinJohn
Black Belt
632 Views

It is important to check the "ulimit" from inside the context of the MPI job -- if you log in to the remote node a different set of limits may apply.

You can easily do this by launching a script instead of the MPI executable and having the script echo the hostname and then execute "ulimit -a".

Victor_G_1
Beginner
632 Views

Actually I've used other workaround. I set "export I_MPI_SHM_LMT=shm". And the issue has been resolved.

But It will be nice to just update MPI. Does the issue fixed or will be fixed in nearest future? 

Our group going to register our software locally (target date is October) and any kind of used workaround should be properly described :(

Victor

Reply