- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the oneAPI "latest" version of Intel MPI, Fortran on a linux cluster. Things are working fine. However, to check my MPI calls, I added -check_mpi to my link step and ran a simple case. The mpi checking works, but the program hangs in MPI_FINALIZE. If I compile without -check_mpi, it does not hang in MPI_FINALIZE. With or without -check_mpi, the calculation runs fine. It just gets stuck in MPI_FINALIZE with -check_mpi.
I did some searching and there are numerous posts about calculations getting stuck in MPI_FINALIZE, regardless of the -check_mpi. The response to the reports is usually to ensure that all communications have completed. However in my case, that is exactly what I want the check_mpi flag to tell me. I don't think that there are outstanding communications, but who knows. Is there a way I can force my way out of MPI_FINALIZE or prompt it to provide me a coherent error message?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Short version: I_MPI_FABRICS=shm will use the Intel® MPI Library shared memory implementation, FI_PROVIDER=shm will use the libfabric shared memory implementation.
I_MPI_FABRICS is used to set the communication provider used by Intel® MPI Library. In older versions, this was the primary mechanism for specifying the interconnect. Starting with 2019, this was modified along with other major internal changes to run all inter-node communications through libfabric. Now, there are three options for I_MPI_FABRICS. shm (shared memory only, only valid for a single-node run), ofi (libfabric only), and shm:ofi (shared memory for intranode, libfabric for internode).
FI_PROVIDER sets the provider to be used by libfabric. By choosing shm here, we will still go through libfabric, and libfabric will use its own shared memory communications. See https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/fabrics-control/ofi-providers-support.html for our documentation regarding provider selection and https://github.com/ofiwg/libfabric for full details on libfabric.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is standard error with VT_VERBOSE=5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, I have sent this to our development team.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try running with FI_PROVIDER=verbs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I set the PROVIDER to verbs, the job now finishes successfully, but I still get a warning about not freeing user-defined datatypes. Previously I was using shm, as this is a small job that runs on only a single node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Progress! If you run without specifying the provider at all, what happens?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I do not specify the FI_PROVIDER explicitly in the batch script, the case runs and finishes successfully. We have been explicitly exporting FI_PROVIDER=shm for jobs that run only on one node, and we have not been specifying otherwise. We are using psm and libfabric/1.10.1 because this cluster has older Qlogic cards on it and we cannot use the precompiled fabrics that come with oneAPI.
I forget now why we are explicitly setting FI_PROVIDER=shm for one-node jobs, and not setting anything otherwise. My guess is that we had some trouble getting things to work and just landed on this particular setup. It works, with the one exception of hangs in MPI_FINALIZE when -check_mpi is invoked at compile time. Also, there is a warning about an unfreed user-defined MPI datatype, but I have not figured out why I get that warning as I do explicitly free this datatype each time it is created. Maybe the fact that I commit it multiple times might be a problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recommend not setting the provider unless it is explicitly necessary.
I'll check with development regarding the possibility of multiple commits causing the warning.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I remember now why we export FI_PROVIDER=shm when we run MPI jobs on a single node. Normally we use the libfabric psm for jobs that run across multiple nodes, and when we do so we use the SLURM SBATCH parameter --exclusive to only allow one job to use those nodes. However for small jobs that use only a few cores, we want to allow multiple jobs to share a node. However, we've noticed that a file called
psm_shm.0fff0fff-0000-0000-0000-0fff0fff0fff
owned by the current person using the node is put into /dev/shm when we use psm, blocking all other potential jobs from using this node. We do not have this problem when we use FI_PROVIDER=shm.
Soooo -- shm lets us run multiple jobs on the same node, but it causes MPI_FINALIZE to hang when using -check_mpi. psm does not exhibit this problem, but it also does not allow multiple jobs to run on the same node, at least we cannot figure out how to do so.
I know this is very confusing. It all started with the installation of Qlogic cards on this cluster, which has led us to use psm, but this has its quirks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens if you use I_MPI_FABRICS=shm instead?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Then it works. But why?
To recap: if I compile my code with -check_mpi and export FI_PROVIDER=shm in my batch script, the job hangs in MPI_FINALIZE. However, if I export I_MPI_FABRICS=shm instead, the job does not hang.
So this solves the problem, but dare I ask why? What is the difference between FI_PROVIDER and I_MPI_FABRICS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Short version: I_MPI_FABRICS=shm will use the Intel® MPI Library shared memory implementation, FI_PROVIDER=shm will use the libfabric shared memory implementation.
I_MPI_FABRICS is used to set the communication provider used by Intel® MPI Library. In older versions, this was the primary mechanism for specifying the interconnect. Starting with 2019, this was modified along with other major internal changes to run all inter-node communications through libfabric. Now, there are three options for I_MPI_FABRICS. shm (shared memory only, only valid for a single-node run), ofi (libfabric only), and shm:ofi (shared memory for intranode, libfabric for internode).
FI_PROVIDER sets the provider to be used by libfabric. By choosing shm here, we will still go through libfabric, and libfabric will use its own shared memory communications. See https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/running-applications/fabrics-control/ofi-providers-support.html for our documentation regarding provider selection and https://github.com/ofiwg/libfabric for full details on libfabric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel support will no longer be monitoring this thread. Any further posts are considered community only. For additional assistance related to this issue, please start a new thread.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »