Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI_Send/ MPI_Rcv don't work with more then 8182 double

Paolo_M_
Beginner
2,440 Views

Hi, I'm having some troubles with the attached code SendReceive.c.

The idea is to open a dataset with process p-1 and then to distribute it to the remaining processes. This solution works when the variable ln (local number of elements) is less than 8182. When I increase the number of elements I've the following error:

mpiexec -np 2 ./sendreceive 16366
Process 0 is receiving 8183 elements from process 1
Process 1 is sending 8183 elements to process 0
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x2000590, count=8183, MPI_DOUBLE, src=1, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0x1) failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error! cannot read from remote process

I'm using the student license of the intel implementation of mpi (obtained by installing Intel® Parallel Studio XE Cluster Edition (includes Fortran and C/C++)).

Is this a limitation of the licence? Otherwise, what I'm doing wrong?

  

0 Kudos
1 Solution
Gregg_S_Intel
Employee
2,440 Views

Your test runs fine.

% mpirun -n 2 a.out 16366
Process 0 is receiving 8183 elements from process 1
Process 1 is sending 8183 elements to process 0
Process 0 received from process 1
Process 1 completed sending to process 0
% mpirun -n 2 a.out 1000000
Process 1 is sending 500000 elements to process 0
Process 0 is receiving 500000 elements from process 1
Process 0 received from process 1
Process 1 completed sending to process 0

 

View solution in original post

0 Kudos
9 Replies
Gregg_S_Intel
Employee
2,440 Views

This is a broadcast, use MPI_Bcast.

0 Kudos
Paolo_M_
Beginner
2,440 Views

This is a simplified example, in my real code process p-1 sends different elements to every process {1,2,..,,p-2}. 

The idea is to open a dataset with process p-1, split it and then to distribute chunks to the remaining processes.

0 Kudos
Gregg_S_Intel
Employee
2,441 Views

Your test runs fine.

% mpirun -n 2 a.out 16366
Process 0 is receiving 8183 elements from process 1
Process 1 is sending 8183 elements to process 0
Process 0 received from process 1
Process 1 completed sending to process 0
% mpirun -n 2 a.out 1000000
Process 1 is sending 500000 elements to process 0
Process 0 is receiving 500000 elements from process 1
Process 0 received from process 1
Process 1 completed sending to process 0

 

0 Kudos
Paolo_M_
Beginner
2,440 Views
Thank you Gregg, probably the problem is in my mpi installation.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,440 Views

There is a Linux system parameter in /etc/security/limits.conf that you configure to set the limits for memlock. You may need a larger value than default. See this for additional information.

Jim Dempsey

0 Kudos
hmli_l_
Beginner
2,440 Views

I have same problem on ubuntu 14.04.5 LTS(4.4.0-31-generic, x64_86), compliled with intel 2017.0.098 or 2016.3.210, the root can run rightly but other account can't run, but  that program can run on other system compiled with intel MPI, or with openmpi 2.0.1

The iptables and firewall have closed.

The ulimit for normal account:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63773
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63773
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

source code:

use mpi
implicit none
real*8 send_data(2000000),recv_data(2000000)
integer ierr,mynode,numnodes,status(5000)
integer i,j,k

send_data=0.0d0
call mpi_init(ierr)
call mpi_comm_rank(MPI_Comm_World, mynode, ierr)
call mpi_comm_size(MPI_Comm_World, numnodes, ierr)
do k=1,10
	i=mod(mynode-1+numnodes,numnodes)
	j=mod(mynode+1 ,numnodes)
	call mpi_sendrecv(send_data,2000000,mpi_double_precision,j,0, &
		& recv_data,2000000,mpi_double_precision,i,0,MPI_Comm_World,status,ierr) 
	print*,i,'->',mynode,'->',j
	if(mynode==0)print*,k
enddo
call mpi_barrier(MPI_Comm_World,ierr)
call mpi_finalize(ierr)
end
   
  • compile: mpiifort -o mpiring mpiring.f90
  • run: hpc@ntd01:~$ mpiexec -n 4 ./mpiring
    Fatal error in MPI_Sendrecv: Other MPI error, error stack:
    MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=3, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=1, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed
    PMPIDI_CH3I_Progress(623).......: fail failed
    pkt_RTS_handler(317)............: fail failed
    do_cts(662).....................: fail failed
    MPID_nem_lmt_dcp_start_recv(288): fail failed
    dcp_recv(154)...................: Internal MPI error!  cannot read from remote process
    Fatal error in MPI_Sendrecv: Other MPI error, error stack:
    MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=0, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=2, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed
    PMPIDI_CH3I_Progress(623).......: fail failed
    pkt_RTS_handler(317)............: fail failed
    do_cts(662).....................: fail failed
    MPID_nem_lmt_dcp_start_recv(288): fail failed
    dcp_recv(154)...................: Internal MPI error!  cannot read from remote process
    Fatal error in MPI_Sendrecv: Other MPI error, error stack:
    MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=1, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=3, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed
    MPID_Irecv(160).................: fail failed
    MPID_nem_lmt_RndvRecv(208)......: fail failed
    do_cts(662).....................: fail failed
    MPID_nem_lmt_dcp_start_recv(288): fail failed
    dcp_recv(154)...................: Internal MPI error!  cannot read from remote process
    
    

It is ok if change recv_data(8183) and send_data(8183) to recv_data(8182) and send_data(8182), and can run on 200 processes on one node(4 cpu cores), but if it is 8183, it can't run even on 2 processes.

  • /etc/hosts:
127.0.0.1    localhost ntd01
# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
  • ifconfig:
  • eth0      Link encap:Ethernet  HWaddr 74:d4:35:b7:dd:72 
              inet addr:xxx.xxx.xx.xx  Bcast:xxx.xxx.xx.255  Mask:255.255.254.0
              inet6 addr: fe80::76d4:35ff:feb7:dd72/64 Scope:Link
              inet6 addr: 2001:da8:d800:144:76d4:35ff:feb7:dd72/64 Scope:Global
              inet6 addr: 2001:da8:d800:144:1c9e:3c0:ab1f:44c/64 Scope:Global
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:15169 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1042 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:1310326 (1.3 MB)  TX bytes:214986 (214.9 KB)
    
    lo        Link encap:Local Loopback 
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:82 errors:0 dropped:0 overruns:0 frame:0
              TX packets:82 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:6392 (6.3 KB)  TX bytes:6392 (6.3 KB)
    
    
0 Kudos
Paolo_M_
Beginner
2,440 Views
Hi, I didn't solve the problem. I tried also with the solution of Jim but nothing. My interest was mainly in the correctness of my code. I hope that someone else could help to solve the problem.
0 Kudos
Luis_Diego_C_
Beginner
2,440 Views

Any news about this issue?

I have updated the parallel studio xe to 2017 update 1 hopping this would be fixed, but still the same thing.

And there's no answer from any one.

Is at least a way or workaround to continue working without this issue?

I cannot found any way to solve this.

0 Kudos
James_T_Intel
Moderator
2,440 Views

Paolo,

Please run with I_MPI_HYDRA_DEBUG=1 and attach the output as a file.

0 Kudos
Reply