Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
112 Views

MPI and Qlogic and SLURM

Jump to solution

We recently upgraded a 36 node linux compute cluster to Centos 7.8 and we're now using SLURM instead of Torque. We have a Fortran code that we compiled with Intel Fortran 19.1 and we're using Intel MPI 2019 Update 7. The cluster uses Qlogic InfiniPath_QLE7340 cards. Long story short, our MPI jobs were acting flaky, randomly failing, and so on. We had recently upgraded another compute cluster in the exact same way except that this other cluster uses Mellanox cards. 

Our sys admin thinks that the MPI errors are due to the libfabric library on the compute nodes. We have the default one installed with the OS upgrading through the rpm package. The Intel compiler suite also ships its own implementation of the same library. We use the Intel compiler and need to load the Intel compiler module, so the Intel version of libfabric is plugged in and loaded into the executable at runtime. But the current Intel version of libfabric does not support the older Qlogic IB cards well. Sometimes it works, sometimes it doesn't. So we re-installed libfabric from source and reset the necessary library paths, so the dynamic linker can load what we installed. After this, everything is good. 

Does this sound like the right course of action? Or are we overlooking some option or configuration that would not have necessitated a complete re-compile of the libraries?

0 Kudos

Accepted Solutions
Highlighted
Moderator
102 Views

Hi Kevin,


Yes you can use your own built libfabric. Please check this page (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-2019-over-libfabric....) to see whether you have followed all the necessary steps.

Regarding the support for Qlogic InfiniPath_QLE7340 hardware we are transferring your query to the subject matter experts who will guide you.


Regards

Prasanth


View solution in original post

0 Kudos
1 Reply
Highlighted
Moderator
103 Views

Hi Kevin,


Yes you can use your own built libfabric. Please check this page (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-2019-over-libfabric....) to see whether you have followed all the necessary steps.

Regarding the support for Qlogic InfiniPath_QLE7340 hardware we are transferring your query to the subject matter experts who will guide you.


Regards

Prasanth


View solution in original post

0 Kudos