- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We recently upgraded a 36 node linux compute cluster to Centos 7.8 and we're now using SLURM instead of Torque. We have a Fortran code that we compiled with Intel Fortran 19.1 and we're using Intel MPI 2019 Update 7. The cluster uses Qlogic InfiniPath_QLE7340 cards. Long story short, our MPI jobs were acting flaky, randomly failing, and so on. We had recently upgraded another compute cluster in the exact same way except that this other cluster uses Mellanox cards.
Our sys admin thinks that the MPI errors are due to the libfabric library on the compute nodes. We have the default one installed with the OS upgrading through the rpm package. The Intel compiler suite also ships its own implementation of the same library. We use the Intel compiler and need to load the Intel compiler module, so the Intel version of libfabric is plugged in and loaded into the executable at runtime. But the current Intel version of libfabric does not support the older Qlogic IB cards well. Sometimes it works, sometimes it doesn't. So we re-installed libfabric from source and reset the necessary library paths, so the dynamic linker can load what we installed. After this, everything is good.
Does this sound like the right course of action? Or are we overlooking some option or configuration that would not have necessitated a complete re-compile of the libraries?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
Yes you can use your own built libfabric. Please check this page (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-2019-over-libfabric.html) to see whether you have followed all the necessary steps.
Regarding the support for Qlogic InfiniPath_QLE7340 hardware we are transferring your query to the subject matter experts who will guide you.
Regards
Prasanth
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
Yes you can use your own built libfabric. Please check this page (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-2019-over-libfabric.html) to see whether you have followed all the necessary steps.
Regarding the support for Qlogic InfiniPath_QLE7340 hardware we are transferring your query to the subject matter experts who will guide you.
Regards
Prasanth
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page