Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
2022 Discussions

IMPI and DAPL fabrics on Infiniband cluster

New Contributor I

Hello, I have been trying to submit a job in our cluster for a intel17 compiled and impi enabled code. I keep getting trouble at startup when running through PBS.

This is the submission script:

#PBS -N propane_XO2_ramp_dx_p3125cm(IMPI)
#PBS -W umask=0022
#PBS -e /home4/mnv/FIREMODELS_ISSUES/fds/Validation/UMD_Line_Burner/Test_Valgrind/propane_XO2_ramp_dx_p3125cm.err
#PBS -o /home4/mnv/FIREMODELS_ISSUES/fds/Validation/UMD_Line_Burner/Test_Valgrind/propane_XO2_ramp_dx_p3125cm.log
#PBS -l nodes=16:ppn=12
#PBS -l walltime=999:0:0
module purge
module load null modules torque-maui intel/17
export I_MPI_FABRICS=shm:dapl
export I_MPI_DEBUG=100
cd /home4/mnv/FIREMODELS_ISSUES/fds/Validation/UMD_Line_Burner/Test_Valgrind
echo $PBS_O_HOME
echo `date`
echo "Input file: propane_XO2_ramp_dx_p3125cm.fds"
echo " Directory: `pwd`"
echo "      Host: `hostname`"
/opt/intel17/compilers_and_libraries/linux/mpi/bin64/mpiexec   -np 184 /home4/mnv/FIREMODELS_ISSUES/fds/Build/impi_intel_linux_64/fds_impi_intel_linux_64 propane_XO2_ramp_dx_p3125cm.fds

As you can see I'm invoking DAPL and OpenIB-cma as dapl provider. This is what I see on my login node /etc/dat.conf

OpenIB-cma u1.2 nonthreadsafe default dapl.1.2 "ib0 0" ""
OpenIB-cma-1 u1.2 nonthreadsafe default dapl.1.2 "ib1 0" ""
OpenIB-cma-2 u1.2 nonthreadsafe default dapl.1.2 "ib2 0" ""
OpenIB-cma-3 u1.2 nonthreadsafe default dapl.1.2 "ib3 0" ""
OpenIB-bond u1.2 nonthreadsafe default dapl.1.2 "bond0 0" ""
ofa-v2-ib0 u2.0 nonthreadsafe default dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default dapl.2.0 "ib1 0" ""
ofa-v2-ib2 u2.0 nonthreadsafe default dapl.2.0 "ib2 0" ""
ofa-v2-ib3 u2.0 nonthreadsafe default dapl.2.0 "ib3 0" ""
ofa-v2-bond u2.0 nonthreadsafe default dapl.2.0 "bond0 0" ""

Now logging in to the actual compute nodes I don't see an /etc/dat.conf on these. I don't know if this is normal or there is an issue there.

Anyways, when I submit the job I get the following attached stdout file, where it seems some of the nodes fail to load OpenIB-cma (with no fallback fabrics).

To be sure, some nodes on the cluster use Qlogic infiniband cards and others use Mellanox.

At this point I've tried several combinations, either specifying or not ib fabrics, without success. I'd really appreciate if you help me troubleshooting this.

Thank you,




0 Kudos
3 Replies
New Contributor I

An extra note:

You can see in the attached file that MPI processes that fail to load OpenIB-cma, are not tied to nodes that use a particular qib0:0 or mlx4_0:0 numa map. See for example process [57] or [108].

Thank you,



Hi Marcos. I have run Fire and Smoke simulations quite a few times. Most recently on an Omnipath fabric, but that is another story.  I would suggest getting whoever runs your cluster to set a node property in PBS such that you can choose all Mellanox or all Qlogic cards.  Also can you run with either leaving the I_MPI_FABRICS not set, or using ofa ?










New Contributor I

Hi John, thank you for your reply! Yes, we do have dedicated queues for Qlogic (24 nodes I think) and Mellanox (12 nodes I think). We have been trying for some time to be able to run large jobs that span more than one dedicated queue, and have been somewhat successful with openmpi (there have been other issues like a constant memory leak we can't track down to our source code).

I have noted that intel mpi (when it runs) does run quite faster than the openmpi available, hence trying to span impi jobs across both sets of nodes. 

I did try running the job using ofa instead of dapl, and also dapl selecting ofa-v2-ib0 in the above configuration list. The problem has always been that the calculation randomly times out at different communication steps. Also, I run the case using tcp and although extremely slow It has run overnight without interruption.

Best Regards,