My steps was: 1. Got Intel MPI 3.0 Evaluation for 30 days 2. Install it on shared directory 3. Configure password-less SSH between nodes 4. Configure (for other purposes) IBoIP - confirmed working 5. Compiled test MPI application - comes with Intel MPI
Now it works over Ethernet for can't run it over IB:
$ mpirun -n 4 -r ssh /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test Hello world: rank 0 of 4 running on n1 Hello world: rank 1 of 4 running on n3 Hello world: rank 2 of 4 running on n4 Hello world: rank 3 of 4 running on n2
$ mpirun -n 4 -r ssh -env I_MPI_DEVICE rdssm:OpenIB-cma -env I_MPI_FALLBACK_DEVICE 0 -env I_MPI_DEBUG 5 /gpfs/loadl/HPL/prefix/intel/mpi/3.0/test/test  DAPL provider is not found and fallback device is not enabled [cli_0]: aborting job: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(925): Initialization failed MPIDD_Init(95).......: channel initialization failed MPIDI_CH3_Init(144)..: generic failure with errno = -1 (unknown)(): rank 3 in job 1 n1_36568 caused collective abort of all ranks exit status of rank 3: return code 13 [output from other nodes skipped]
My IB configuration: OFED 1.2.5 from Cisco: OFED-1.2.5
$ cat /etc/dat.conf # # DAT 1.2 configuration file # # Each entry should have the following fields: # # # # # For the uDAPL cma provder, specify as one of the following: # network address, network hostname, or netdev name and 0 for port # # Simple (OpenIB-cma) default with netdev name provided first on list # to enable use of same dat.conf version on all nodes # # Add examples for multiple interfaces and IPoIB HA fail over, and bonding # OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib0 0" "" OpenIB-cma-1 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib1 0" "" OpenIB-cma-2 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib2 0" "" OpenIB-cma-3 u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "ib3 0" "" OpenIB-bond u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.1.2 "bond0 0" ""
Please make sure that you have set 64-bit MPI environment. Source mpivars.sh file from the $install_dir/bin64 directory to be able build 64-bit MPI application. You should also have 64-bit version of gcc compiler as your default gcc compiler while using the mpicc compiler driver.