Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2295 ディスカッション

Internal MPI error! cannot read from remote process

paul_g_1
ビギナー
6,282件の閲覧回数

On my Dell laptop (E7440 with an Intel i7), mpirun on my program terminates with the following:

Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x8cb720, count=9882, MPI_DOUBLE_PRECISION, src=1, tag=100, MPI_COMM_WORLD, status=0x7ffc71f7f0b0) failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error!  cannot read from remote process

On my Dell desktop (T7810 with Intel Xeon chips), the same code executes just fine.

0 件の賞賛
6 返答(返信)
James_T_Intel
モデレーター
6,282件の閲覧回数

Can you provide output with I_MPI_DEBUG=6 and I_MPI_HYDRA_DEBUG=1?  Please attach as a file.

paul_g_1
ビギナー
6,282件の閲覧回数

Here it is.

James_T_Intel
モデレーター
6,282件の閲覧回数

Hmm, nothing immediately obvious here.  Can you post the code you're trying to run?

Bruce_R_
ビギナー
6,282件の閲覧回数

I am having a similar issue attempting to run code in parallel for the first time. The code was written in fortran and generally run at HPC facilities. Was there ever any resolution? Can provide more details if warranted. The code runs fine in single processor mode.

 

paul_g_1
ビギナー
6,282件の閲覧回数

Hi Bruce,

The issue wasn't really resolved. Unfortunately, some of the code I'm using cannot be redistributed without the permission of the author, so I can't just upload my code to let Intel take a look at it.

I think that the problem is that I was trying to send too much data through MPI_SEND. My solution was to send smaller slices of data. 

Paul

James_T_Intel
モデレーター
6,282件の閲覧回数

If you are attempting to send a message over 2 GB, you will need to split it apart.

返信