Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Dave_K_
Beginner
108 Views

problem with mpi_comm_accept

 I am trying to set up a server client pair that establishes a connection (after being launched independently) using mpi_comm_accept and connect. I successfully have the server wait for a connection request using MPI_Comm_accept.  The client successfully connects to using MPI_Comm_connect.   Both the server and the client return without any error but the a negative handle is returned in 'newcomm' to both the server and the client.  I launch both using mpiexec and have mpd running.

I cannot figure out what is wrong and it is probably something simple.  Any ideas?

SERVER:

Integer*4 ierr,rt_gang_comm

call MPI_OPEN_PORT(MPI_INFO_NULL,portA,ierr)

write(contape,*) 'Gang Master port = ' // trim(portA)

… server writes the name to a file

call MPI_COMM_ACCEPT(portA,MPI_INFO_NULL,0,MPI_COMM_SELF,rt_gang_comm, ierr)

if (ierr) then

  stop ‘serever problem’

else

  write(contape,*) 'MPI_COMM_WORLD = ',MPI_COMM_WORLD,' rt_gang_com = ',rt_gang_comm

endif

call MPI_Bcast(i_rec,n_i_s,MPI_INTEGER4,id_mpi_rank,rt_gang_comm, ierr)

if (ierr)then

   callMPI_ERROR_STRING(ierr,errs,err_len,ierr1)

   stop

endif

 

 

 

 

SERVER OUTPUT:

 

Gang Master port = tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:

 0:0:0:0:0$

MPI_COMM_WORLD =   1140850688  rt_gang_com =  -2080374784

  

CLIENT:

 

Integer*4 ierr,intercom

… client reads port name from the server written file

write(contape,*)'Opening port', trim(portA)

call MPI_COMM_CONNECT(portA, MPI_INFO_NULL, 0, MPI_COMM_WORLD, intercomm, ierr)

if (ierr) then

  stop ‘client problem’

else

  print*,"MPI_COMM_WORLD=",MPI_COMM_WORLD,"intercomm=", intercom

endif

 

CLIENT OUTOPUT:

 

Opening port  tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:0:0:0:0:0$

MPI_COMM_WORLD=  1140850688 intercomm= -2080374783

 

 

A subsequent broadcasr hangs.  They are trying to communicate over a communicator that differs by one.

 

I cannot find anything around that can seem to help so I would appreciate even ideas to try.

 

Thanks,

 

Dave

0 Kudos
14 Replies
Dave_K_
Beginner
108 Views

A few particulars: ifort version 12.1.0 CentOS release 5.7 impi/4.0.3/lib64
TimP
Black Belt
108 Views

If you should happen to use an architecture option -xHost on AVX hardware, or one including AVX, I don't know whether it will work, since CentOS 5.7 doesn't support AVX.
Dave_K_
Beginner
108 Views

Thanks for you input Tim. I was compiling with -xSSE4.2 because it helped with optimization (at some point). I removed it and recompiled with defaults for both the client and server and still end up with a negative pointer.
Dave_K_
Beginner
108 Views

Are there any other ideas out there? I am so stumped.
James_T_Intel
Moderator
108 Views

Hi Dave, I've tested the basic functionality of your program, and it appears to work even with a negative value for the communicator. That is simply a handle to the communicator, and as long as the two have the same communicator, the connection should work. From the server side: [plain] $ mpirun -n 1 ./server MPI_COMM_WORLD = 1140850688 newcomm = -2080374784 value = 25 $ cat portname.txt tag#0$rdma_port0#25033$rdma_host0#2:0:0:36:101:26:30:0:0:0:0:0:0:0:0$ [/plain] And the client side: [plain] $ mpirun -n 1 ./client MPI_COMM_WORLD = 1140850688 newcomm = -2080374784 value = 25 [/plain] If you're still having problems, please run with I_MPI_DEBUG=5 and attach the output. Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
Dave_K_
Beginner
108 Views

James, When I run it the negative communicators differ by one as reflected in the output. Both the client and server have valid communicators but the do not match so the each hange on a broadcast. Dave
Dave_K_
Beginner
108 Views

James, My example was snipped out so the output does not match excatly but it is attached.
Dave_K_
Beginner
108 Views

James, My example was snipped out so the output does not match excatly but it is attached.
Dave_K_
Beginner
108 Views

James attached is debug=10 Dave
Dave_K_
Beginner
108 Views

James attached is debug=10 Dave
Dave_K_
Beginner
108 Views

Don't think my last attachment made it
James_T_Intel
Moderator
108 Views

Hi Dave, I don't see anything immediately wrong in your output (other than the mismatched communicator handle). Can you try using the current version of the Intel® MPI Library (Version 4.1)? Can you send me a reproducer code to test here? Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
Dave_K_
Beginner
108 Views

James, I sent the perfect test case to you but not have heard back? Dave
James_T_Intel
Moderator
108 Views

Hi Dave, I have not received the test case. Did you send it via private message? How large is the file? Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
Reply