Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpirun errors

lakshitha_p_
Beginner
1,043 Views

Hi,

I've been using intel compilers for long time and I'm really impressed about them. Recently I created a small cluster and installed intel parallel studio_xe cluster edition. I wanted to use MPI for my scientific computing work. I installed everything correctly and configured all the environmental variables. The operating system of cluster is linux opensuse Leap. I set up manually a password-less ssh connection and tested and worked properly. mpiifort wrapper work fine. Everything work find until to point where I have to run mpirun command. When I give the mpirun following command 

       mpirun -n 2 -ppn 1 -f hosts ./a.out 

it gives goes into a infinite loop and when break the process by force gives following messages,

^C[mpiexec@master] Sending Ctrl-C to processes as requested
[mpiexec@master] Press Ctrl-C again to force abort
[mpiexec@master] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@master] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:252): unable to write data to proxy
[mpiexec@master] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:174): unable to send signal downstream
[mpiexec@master] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@master] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event
[mpiexec@master] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion

I had previously installed openMPI and mpich version. But I removed all of them and tried again but ended up with same results. I checked which mpirun is called by using the command 'which mpirun', the it gives me,

/home/lakshitha/intel/compilers_and_libraries_2017.1.132/linux/mpi/intel64/bin/mpirun

 

I've attached hosts file here which has the names of hosts. I tried several things but did not work. I really appreciate if somebody can sort this out.

0 Kudos
0 Replies
Reply