we are trying to run compiled (using ifort + gcc) application using Intel MPI. Please have a look at problems we are facing:
Problem1: When trying to mpirun software we are receiving:mpdboot_wn1 (handle_mpd_output 752): from mpd on wn2, invalid port info:
wn2.cluster: Connection refused
Problem2:when trying to avoid Problem1 and run under different MPI (mvapich2) we are receiving
an error about missing symbol mpi_init_. I thought binary should be
portable between MPI implementations.
We followed the procedure:1. Created mpd.hosts file:
Software was compiled against Intel MPI.
One more question: What is the _proper_ way of linking compiled GCC and Intel Fortran code?
The steps you complete up to #7 are actually all correct. What you have to understand is how the mpirun script functions. You can think of mpirun as a "wrapper" that actually executes 3 other commands:
mpdboot (to start the MPD daemons),
mpiexec (to run your app),
mpdallexit (to close out the MPD ring).
So steps 1 - 6 are fine but then you use mpirun, which would actually close out your MPD daemons and try to restart them. But when you restart them, you no longer use ssh (since you don't specify that option). So mpirun is unable to connect to the other nodes via rsh (which is default) and you see the error.
That means you have two options here:
A. Use the
mpirun only (the arguments for mpirun are:
mpiexec -n 4 [executable_and_params]
mpdallexit#close out the MPD daemons
mpirun -r ssh -f mpd.hosts -n 4 [executable_and_params]
mpirun -r ssh -f mpd.hosts -n 8 [executable_and_params]
So If I want to use all the cores by MPI processes (4 nodes * 8 cores) I need to run
mpirun -r ssh -f mpd.hosts -n 32 [executable_and_params]