- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm having some troubles with the attached code SendReceive.c.
The idea is to open a dataset with process p-1 and then to distribute it to the remaining processes. This solution works when the variable ln (local number of elements) is less than 8182. When I increase the number of elements I've the following error:
mpiexec -np 2 ./sendreceive 16366
Process 0 is receiving 8183 elements from process 1
Process 1 is sending 8183 elements to process 0
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x2000590, count=8183, MPI_DOUBLE, src=1, tag=MPI_ANY_TAG, MPI_COMM_WORLD, status=0x1) failed
PMPIDI_CH3I_Progress(623).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(288): fail failed
dcp_recv(154)...................: Internal MPI error! cannot read from remote process
I'm using the student license of the intel implementation of mpi (obtained by installing Intel® Parallel Studio XE Cluster Edition (includes Fortran and C/C++)).
Is this a limitation of the licence? Otherwise, what I'm doing wrong?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your test runs fine.
% mpirun -n 2 a.out 16366 Process 0 is receiving 8183 elements from process 1 Process 1 is sending 8183 elements to process 0 Process 0 received from process 1 Process 1 completed sending to process 0 % mpirun -n 2 a.out 1000000 Process 1 is sending 500000 elements to process 0 Process 0 is receiving 500000 elements from process 1 Process 0 received from process 1 Process 1 completed sending to process 0
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a broadcast, use MPI_Bcast.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a simplified example, in my real code process p-1 sends different elements to every process {1,2,..,,p-2}.
The idea is to open a dataset with process p-1, split it and then to distribute chunks to the remaining processes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your test runs fine.
% mpirun -n 2 a.out 16366 Process 0 is receiving 8183 elements from process 1 Process 1 is sending 8183 elements to process 0 Process 0 received from process 1 Process 1 completed sending to process 0 % mpirun -n 2 a.out 1000000 Process 1 is sending 500000 elements to process 0 Process 0 is receiving 500000 elements from process 1 Process 0 received from process 1 Process 1 completed sending to process 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a Linux system parameter in /etc/security/limits.conf that you configure to set the limits for memlock. You may need a larger value than default. See this for additional information.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have same problem on ubuntu 14.04.5 LTS(4.4.0-31-generic, x64_86), compliled with intel 2017.0.098 or 2016.3.210, the root can run rightly but other account can't run, but that program can run on other system compiled with intel MPI, or with openmpi 2.0.1
The iptables and firewall have closed.
The ulimit for normal account:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63773 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 63773 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
source code:
use mpi implicit none real*8 send_data(2000000),recv_data(2000000) integer ierr,mynode,numnodes,status(5000) integer i,j,k send_data=0.0d0 call mpi_init(ierr) call mpi_comm_rank(MPI_Comm_World, mynode, ierr) call mpi_comm_size(MPI_Comm_World, numnodes, ierr) do k=1,10 i=mod(mynode-1+numnodes,numnodes) j=mod(mynode+1 ,numnodes) call mpi_sendrecv(send_data,2000000,mpi_double_precision,j,0, & & recv_data,2000000,mpi_double_precision,i,0,MPI_Comm_World,status,ierr) print*,i,'->',mynode,'->',j if(mynode==0)print*,k enddo call mpi_barrier(MPI_Comm_World,ierr) call mpi_finalize(ierr) end
- compile: mpiifort -o mpiring mpiring.f90
- run: hpc@ntd01:~$ mpiexec -n 4 ./mpiring
Fatal error in MPI_Sendrecv: Other MPI error, error stack: MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=3, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=1, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Sendrecv: Other MPI error, error stack: MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=0, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=2, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed PMPIDI_CH3I_Progress(623).......: fail failed pkt_RTS_handler(317)............: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process Fatal error in MPI_Sendrecv: Other MPI error, error stack: MPI_Sendrecv(259)...............: MPI_Sendrecv(sbuf=0x6b5820, scount=8183, MPI_DOUBLE_PRECISION, dest=1, stag=0, rbuf=0x83c220, rcount=8183, MPI_DOUBLE_PRECISION, src=3, rtag=0, MPI_COMM_WORLD, status=0x9c2c20) failed MPID_Irecv(160).................: fail failed MPID_nem_lmt_RndvRecv(208)......: fail failed do_cts(662).....................: fail failed MPID_nem_lmt_dcp_start_recv(288): fail failed dcp_recv(154)...................: Internal MPI error! cannot read from remote process
It is ok if change recv_data(8183) and send_data(8183) to recv_data(8182) and send_data(8182), and can run on 200 processes on one node(4 cpu cores), but if it is 8183, it can't run even on 2 processes.
- /etc/hosts:
127.0.0.1 localhost ntd01 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
- ifconfig:
-
eth0 Link encap:Ethernet HWaddr 74:d4:35:b7:dd:72 inet addr:xxx.xxx.xx.xx Bcast:xxx.xxx.xx.255 Mask:255.255.254.0 inet6 addr: fe80::76d4:35ff:feb7:dd72/64 Scope:Link inet6 addr: 2001:da8:d800:144:76d4:35ff:feb7:dd72/64 Scope:Global inet6 addr: 2001:da8:d800:144:1c9e:3c0:ab1f:44c/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:15169 errors:0 dropped:0 overruns:0 frame:0 TX packets:1042 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1310326 (1.3 MB) TX bytes:214986 (214.9 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:82 errors:0 dropped:0 overruns:0 frame:0 TX packets:82 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:6392 (6.3 KB) TX bytes:6392 (6.3 KB)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any news about this issue?
I have updated the parallel studio xe to 2017 update 1 hopping this would be fixed, but still the same thing.
And there's no answer from any one.
Is at least a way or workaround to continue working without this issue?
I cannot found any way to solve this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paolo,
Please run with I_MPI_HYDRA_DEBUG=1 and attach the output as a file.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page