Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpirun seems to set GOMP_CPU_AFFINITY

Ronald_G_2
Beginner
2,255 Views

It appears Intel MPI is setting GOMP_CPU_AFFINITY, why?  How do I prevent this?

When I print my env I get:

bash-4.2$ env | grep OMP
OMP_PROC_BIND=true
OMP_PLACES=threads
OMP_NUM_THREADS=2

When I mpirun env I see that GOMP_CPU_AFFINITY has been set for me, WHY?

bash-4.2$
bash-4.2$ mpirun -n 1 env | grep OMP
OMP_PROC_BIND=true
OMP_NUM_THREADS=2
OMP_PLACES=threads
GOMP_CPU_AFFINITY=0,1

The reason this is a problem is that I'm using OMP env vars to control affinity.  Observe:

bash-4.2$ env | grep I_MPI
I_MPI_PIN_DOMAIN=2:compact
I_MPI_FABRICS=shm:tmi
I_MPI_RESPECT_PROCESS_PLACEMENT=0
I_MPI_CC=icc
I_MPI_DEBUG=4
I_MPI_PIN_ORDER=bunch
I_MPI_PIN_RESPECT_CPUSET=off
I_MPI_ROOT=/opt/intel-mpi/2017

Why this is a problem:  I get a bunch of bizarre warnings about GOMP_CPU_AFFINITY, and invalid OS proc ID for the procs listed in GOMP_CPU_AFFINITY  like this:

bash-4.2$ mpirun -n 1 ./hello_mpi
OMP: Warning #181: OMP_PROC_BIND: ignored because GOMP_CPU_AFFINITY has been defined
OMP: Warning #181: OMP_PLACES: ignored because GOMP_CPU_AFFINITY has been defined
OMP: Warning #123: Ignoring invalid OS proc ID 1.

 hello from master thread
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): shm and tmi data transfer modes
[0] MPI startup(): Rank    Pid      Node name           Pin cpu
[0] MPI startup(): 0       81689    kit002.localdomain  {0,36}
hello_parallel.f: Number of tasks=  1 My rank=  0 My name=kit002.localdomain

I have a hybrid MPI/OpenMP code compiled with Intel 2017 and run with Intel MPI 2017, on a Linux cluster under SLURM.  The code has a simple OMP master region which prints hello from the master thread, then exits the parallel region and prints the number of ranks, which rank this is, and the host name for the node.  Simple stuff:

program hello_parallel

  ! Include the MPI library definitons:
  include 'mpif.h'

  integer numtasks, rank, ierr, rc, len, i
  character*(MPI_MAX_PROCESSOR_NAME) name

  !$omp master
   print*, "hello from master thread"
  !$omp end master

  ! Initialize the MPI library:
  call MPI_INIT(ierr)
  if (ierr .ne. MPI_SUCCESS) then
     print *,'Error starting MPI program. Terminating.'
     call MPI_ABORT(MPI_COMM_WORLD, rc, ierr)
  end if

  ! Get the number of processors this job is using:
  call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)

  ! Get the rank of the processor this thread is running on.  (Each
  ! processor has a unique rank.)
  call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

  ! Get the name of this processor (usually the hostname)
  call MPI_GET_PROCESSOR_NAME(name, len, ierr)
  if (ierr .ne. MPI_SUCCESS) then
     print *,'Error getting processor name. Terminating.'
     call MPI_ABORT(MPI_COMM_WORLD, rc, ierr)
  end if

  print "('hello_parallel.f: Number of tasks=',I3,' My rank=',I3,' My name=',A,'')",&
       numtasks, rank, trim(name)

  ! Tell the MPI library to release all resources it is using:
  call MPI_FINALIZE(ierr)

end program hello_parallel

Compiled simply:   mpiifort -g -qopenmp -o hello_mpi hello_mpi.f90

 

0 Kudos
6 Replies
McCalpinJohn
Honored Contributor III
2,255 Views

In the Intel MPI Developer Reference Guide available at https://software.intel.com/en-us/articles/intel-mpi-library-documentation, Section 3.2 contains an 18 page discussion of exactly how to control process binding, and how the various binding mechanisms interact.

0 Kudos
dogunter
Beginner
2,255 Views

Thank you pointing out these reference manuals but they fail to address the problem we are seeing.

Somehow when we invoke 'mpirun', GOMP_CPU_AFFINITY is being set and, thus, any of the I_MPI_* variables we would like to use for controlling process affinity are being ignored.

Do you have an answer to this? If not, could you pass this along to someone who has an idea?

0 Kudos
McCalpinJohn
Honored Contributor III
2,255 Views

It sounds like the "mpirun" that you are executing is not the standard version from Intel.   Many sites wrap the "mpirun" executable so that it will work with their batch queue management system, and these often set additional environment variables.

0 Kudos
Ronald_G_2
Beginner
2,255 Views

John, I got to the bottom of this.  No, our 'mpirun' is not wrapped, it's the default from Intel. 

What I found is that on our SLURM cluster, Intel's mpirun is actually calling slurm's 'srun' to launch the application.  It's the 'srun' that is setting GOMP_CPU_AFFINITY, not mpirun.

SO if i can pass 'srun' the arg "--mpibind=off" I can disable this behavior. 

SO my question, how do I pass an 'srun' argument through iMPI's 'mpirun'?  In other words, I want to give mpirun --mpibind=off  but have mpirun pass this through to the srun arguments.  Can we do this?  I mean to say, can we pass arguments through untouched in 'mpirun' without editing the mpirun script?  I prefer to not touch mpirun, as I don't want to have to edit 'mpirun' on every release we install.

Ron

 

0 Kudos
McCalpinJohn
Honored Contributor III
2,255 Views

Hmm.... In my systems the nesting is the other way -- I run an srun (or sbatch) command on a shell, then the shell contains the mpirun (or mpiexec.hydra) job launch.  

Looking over the control scripts on the TACC systems, it looks like there is support for nesting in the opposite direction as well (i.e., the same that you are using), but I have never tried it before.

Section 2.2 "Simplified Job Startup" of the document "Intel MPI Library for Linux OS Developer Reference" (file "intelmpi-2017-update1-developer-reference-linux.pdf") also shows that the "mpirun" command does support launching a job using SLURM.   I did not see any direct mention of how to pass options to SLURM in this discussion, but there is a general mechanism for passing environment variables through mpirun.  These are described in Section 2.3 "Hydra Process Manager Command", and include the "-genv <ENVAR> <value>" option, the "-genvlist" option, and a few others. 

I often use "--cpu_bind=none" as a direct option to srun.  The srun man page seems to suggest that setting the SLURM_CPU_BIND environment variable will have the effect of adding a "--cpu_bind" option, so adding "-genv SLURM_CPU_BIND verbose,none" to the mpirun command line might do what you need.  The srun man page also says that when "--cpu_bind" is in use, SLURM will set the environment variables SLURM_CPU_BIND_VERBOSE, SLURM_CPU_BIND_TYPE, and SLURM_CPU_BIND_LIST, which you may want to inspect as well.

0 Kudos
Ronald_G_2
Beginner
2,255 Views

srun --mpibind=off works, and was a good clue to help me solve this.

mpirun calls mpiexec.hydra, which then calls slurm's srun as it's bootstrap.  SO you use bootstrap exec args to work around the issue above.  Here are 2 solutions:

mpirun --bootstrap-exec-args="--mpibind=off"  ... etc

or

export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--mpibind=off"

either of these work to prevent GOMP_CPU_AFFINITY from being set by srun. 
I consider this issued closed.  Thanks John!

ron

0 Kudos
Reply