Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Oversubscription of MPI processes

Kevin_McGrattan
2,781 Views

I have a Fortran program that uses Intel MPI. I have a Windows 10 computer with 1 socket, 4 cores, 8 logical processors. When I run my code using 32 MPI processes under Intel MPI 18 update 2, things run reasonably well. When I repeat the experiment with MPI 19 update 4, the code runs much more slowly. I notice that most of the processes are mapped to CPU 03 according to the Task Manager. With MPI 18, the processes are mapped more randomly. Has there been a major change in the default affinities between MPI 18 and 19. I am not using any special options. 

0 Kudos
12 Replies
jimdempseyatthecove
Honored Contributor III
2,781 Views

Try: mpirun -genv I_MPI_DEBUG=5 -n 32 ./yourProgram

0 Kudos
Kevin_McGrattan
2,781 Views

Does the I_MPI_DEBUG option do anything other than print extra info? I ask because my application seems to be running better now, and I see that the ranks are all being pinned in a round robin style which is what I would have expected. I guess the Task Manager is confusing me, because mpiexec is telling me that my 32 MPI processes are being mapped to pins 0 through 7, but the Task Manager's detailed list of processes indicates that most of my MPI processes are assigned to CPU 3. I'm not sure whether CPU 3 refers to the core or the logical processor. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,786 Views

>>Does the I_MPI_DEBUG option do anything other than print extra info?

Not that I am aware of.

Note:

mpirun, mpiexec, hydra, (other)

launch and monitor managers all have different behaviors .AND. the environment setups are distinct from that of the Fortran environment. Assure that versions of MPI and Fortran are the same. While this should not make a difference "do not add to the soup of confusion".

On Windows there are three affinity pinning collections (Task Manager only shows one): Process Affinity, Process Group Affinity, Thread Affinity.

Added to the confusion, mpirun (mpiexec, hydra, ...) launch one or more additional threads, which are likely not pinned (though may be). What you are seeing in Task Manager may not be what is happening under the hood.

Setup the configuration with the poor performance. Add to the MPI process a compute function that runs a long time (after the MPI initialization). Launch with 32 processes, then run the Task Manager.

Also, will all 32 processes fit in the available RAM?

Jim Dempsey

0 Kudos
Kevin_McGrattan
2,786 Views

Actually, my situation has not improved because I was running my code still using the Intel MPI 18 library. When running with MPI 19 and I_MPI_DEBU=5, I get the messages

libfabric provider: sockets

Unable to read tuning file for ch4 level

Unable to read tuning file for net level

Unable to read tuning file for shm level

The MPI processes seem to be mapped properly to the CPU pins. What is the meaning of these messages? I googled it, but did not come up with any explanation or resolution. Thanks.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,786 Views

I've never seen those reports. When I Google:

      "Unable to read tuning file for" site:intel.com

First two hits are::

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/799716
https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-basic-techniques.html

These may be of help (in understanding the error message). Hopefully the second one will aid in the pinning issue.

Hint: When you Google, it helps to qualify what and where you search.

Jim Dempsey

0 Kudos
Kevin_McGrattan
2,786 Views

Thanks. I also found these articles, but I cannot figure out what options might solve the problem. I put in a support ticket. I think that pinning is not the issue, as the pinning appears to be the same with both MPI 18 and 19. I'm running this on one computer, so the fabric is shm, or it appears to be that way. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,786 Views

As a side note, on one of the IDZ Forum threads there was a posting relating to an affinity pinning issue. The user had a C++ main and dialogs together with a Fortran DLL that used OpenMP. To complicate his pinning problem his C++ code had an option to create/use or not create/use affinity pinned ancilliary threads. His complaint was that when selecting "not create/use affinity pinned ancilliary threads" the Fortran DLL affinity pinning wasn't taking effect. After disclosure of his circumstances, I suggested he look at his "not create/use affinity pinned ancilliary threads" code to see if it were mucking with affinity pinning (IOW remove pinning). This would have affected MPI process pinning had his application been multi-process. Do you have anything like this going on that you have not disclosed?

Jim Dempsey

0 Kudos
Kevin_McGrattan
2,786 Views

No, my application is a large Fortran computational fluids code where the computational domain is divided into 32 grids and run with 32 MPI processes. We are intentionally oversubscribing the cores because our users typically do this (against our advice, of course). This test case runs well with MPI 18 and very slowly with 19. Your advice to include the debug option produced those warning messages. I have a ticket in to Intel, and hopefully they can clarify.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,786 Views
0 Kudos
Kevin_McGrattan
2,786 Views

I_MPI_WAIT_MODE environment variable is not supported. This message gets printed out in debug mode.

I am waiting for Intel support to tell me what the "unable to read tuning file ..." messages mean. 

There is clearly something different about MPI 18 and 19 in regards to oversubscription.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,786 Views

>>I_MPI_WAIT_MODE environment variable is not supported

There may be a different option that does the same thing.

Also, with 32 processes on 8 hardware threads, I'd suggest turning process pinning off:

I_MPI_PIN=off

See: https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/process-pinning/environment-variables-for-process-pinning.html

Jim Dempsey

0 Kudos
Kevin_McGrattan
2,786 Views

I_MPI_PIN=off did not help. I then tried I_MPI_THREAD_YIELD, which is said to be for over-subscribed cases. Setting this to 2 or 3 did not help. 2 is the default when I_MPI_WAIT_MODE=1, but that variable isn't supported in 19u4, which I am using.

We have had better luck with later versions of MPI 19 and I_MPI_WAIT_MODE=1. We were just looking for a work-around for our users who have a version of our code built with 19u4. We are trying to avoid a re-release, but it looks like that is what we must do.

Thanks for your help.

0 Kudos
Reply