Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

problem with mpi_comm_accept

Dave_K_
Beginner
2,402 Views

 I am trying to set up a server client pair that establishes a connection (after being launched independently) using mpi_comm_accept and connect. I successfully have the server wait for a connection request using MPI_Comm_accept.  The client successfully connects to using MPI_Comm_connect.   Both the server and the client return without any error but the a negative handle is returned in 'newcomm' to both the server and the client.  I launch both using mpiexec and have mpd running.

I cannot figure out what is wrong and it is probably something simple.  Any ideas?

SERVER:

Integer*4 ierr,rt_gang_comm

call MPI_OPEN_PORT(MPI_INFO_NULL,portA,ierr)

write(contape,*) 'Gang Master port = ' // trim(portA)

… server writes the name to a file

call MPI_COMM_ACCEPT(portA,MPI_INFO_NULL,0,MPI_COMM_SELF,rt_gang_comm, ierr)

if (ierr) then

  stop ‘serever problem’

else

  write(contape,*) 'MPI_COMM_WORLD = ',MPI_COMM_WORLD,' rt_gang_com = ',rt_gang_comm

endif

call MPI_Bcast(i_rec,n_i_s,MPI_INTEGER4,id_mpi_rank,rt_gang_comm, ierr)

if (ierr)then

   callMPI_ERROR_STRING(ierr,errs,err_len,ierr1)

   stop

endif

 

 

 

 

SERVER OUTPUT:

 

Gang Master port = tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:

 0:0:0:0:0$

MPI_COMM_WORLD =   1140850688  rt_gang_com =  -2080374784

  

CLIENT:

 

Integer*4 ierr,intercom

… client reads port name from the server written file

write(contape,*)'Opening port', trim(portA)

call MPI_COMM_CONNECT(portA, MPI_INFO_NULL, 0, MPI_COMM_WORLD, intercomm, ierr)

if (ierr) then

  stop ‘client problem’

else

  print*,"MPI_COMM_WORLD=",MPI_COMM_WORLD,"intercomm=", intercom

endif

 

CLIENT OUTOPUT:

 

Opening port  tag#0$rdma_port0#5124$rdma_host0#2:0:0:192:168:11:220:0:0:0:0:0:0:0:0$

MPI_COMM_WORLD=  1140850688 intercomm= -2080374783

 

 

A subsequent broadcasr hangs.  They are trying to communicate over a communicator that differs by one.

 

I cannot find anything around that can seem to help so I would appreciate even ideas to try.

 

Thanks,

 

Dave

0 Kudos
14 Replies
Dave_K_
Beginner
2,402 Views
A few particulars: ifort version 12.1.0 CentOS release 5.7 impi/4.0.3/lib64
0 Kudos
TimP
Honored Contributor III
2,402 Views
If you should happen to use an architecture option -xHost on AVX hardware, or one including AVX, I don't know whether it will work, since CentOS 5.7 doesn't support AVX.
0 Kudos
Dave_K_
Beginner
2,402 Views
Thanks for you input Tim. I was compiling with -xSSE4.2 because it helped with optimization (at some point). I removed it and recompiled with defaults for both the client and server and still end up with a negative pointer.
0 Kudos
Dave_K_
Beginner
2,402 Views
Are there any other ideas out there? I am so stumped.
0 Kudos
James_T_Intel
Moderator
2,402 Views
Hi Dave, I've tested the basic functionality of your program, and it appears to work even with a negative value for the communicator. That is simply a handle to the communicator, and as long as the two have the same communicator, the connection should work. From the server side: [plain] $ mpirun -n 1 ./server MPI_COMM_WORLD = 1140850688 newcomm = -2080374784 value = 25 $ cat portname.txt tag#0$rdma_port0#25033$rdma_host0#2:0:0:36:101:26:30:0:0:0:0:0:0:0:0$ [/plain] And the client side: [plain] $ mpirun -n 1 ./client MPI_COMM_WORLD = 1140850688 newcomm = -2080374784 value = 25 [/plain] If you're still having problems, please run with I_MPI_DEBUG=5 and attach the output. Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
0 Kudos
Dave_K_
Beginner
2,402 Views
James, When I run it the negative communicators differ by one as reflected in the output. Both the client and server have valid communicators but the do not match so the each hange on a broadcast. Dave
0 Kudos
Dave_K_
Beginner
2,402 Views
James, My example was snipped out so the output does not match excatly but it is attached.
0 Kudos
Dave_K_
Beginner
2,402 Views
James, My example was snipped out so the output does not match excatly but it is attached.
0 Kudos
Dave_K_
Beginner
2,402 Views
James attached is debug=10 Dave
0 Kudos
Dave_K_
Beginner
2,402 Views
James attached is debug=10 Dave
0 Kudos
Dave_K_
Beginner
2,402 Views
Don't think my last attachment made it
0 Kudos
James_T_Intel
Moderator
2,402 Views
Hi Dave, I don't see anything immediately wrong in your output (other than the mismatched communicator handle). Can you try using the current version of the Intel® MPI Library (Version 4.1)? Can you send me a reproducer code to test here? Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
0 Kudos
Dave_K_
Beginner
2,402 Views
James, I sent the perfect test case to you but not have heard back? Dave
0 Kudos
James_T_Intel
Moderator
2,402 Views
Hi Dave, I have not received the test case. Did you send it via private message? How large is the file? Sincerely, James Tullos Technical Consulting Engineer Intel® Cluster Tools
0 Kudos
Reply