Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2248 Discussions

How to interpret this KMP_AFFINITY info

Øyvind_Jensen
Beginner
151 Views

I am using MPI and OpenMP on a single node with 4 CPUs.  Each CPU has 18 cores.  I am trying to analyze the performance of my application by launching varying combinations of MPI processes and OpenMP threads. What confuses me, is that KMP_AFFINITY information indicates that each MPI-process gets pinned to exactly the same hardware threads.

I start my program like this:

export OMP_NUM_THREADS=8
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_ORDER=compact
export KMP_AFFINITY=verbose
mpirun --ppn 2 --np 2 ./my_exe

stderr of MPI process 0:

OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3,72-75
OMP: Info #217: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 8 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #288: KMP_AFFINITY: topology layer "NUMA domain" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #288: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 4 cores/socket x 2 threads/core (4 total cores)
OMP: Info #219: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 72 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 1 thread 2
OMP: Info #172: KMP_AFFINITY: OS proc 73 maps to socket 0 core 1 thread 3
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 thread 4
OMP: Info #172: KMP_AFFINITY: OS proc 74 maps to socket 0 core 2 thread 5
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 thread 6
OMP: Info #172: KMP_AFFINITY: OS proc 75 maps to socket 0 core 3 thread 7

stderr of MPI process 1:

OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 4-7,76-79
OMP: Info #217: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 8 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #288: KMP_AFFINITY: topology layer "NUMA domain" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #288: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 4 cores/socket x 2 threads/core (4 total cores)
OMP: Info #219: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 76 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 1 thread 2
OMP: Info #172: KMP_AFFINITY: OS proc 77 maps to socket 0 core 1 thread 3
OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 2 thread 4
OMP: Info #172: KMP_AFFINITY: OS proc 78 maps to socket 0 core 2 thread 5
OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 6
OMP: Info #172: KMP_AFFINITY: OS proc 79 maps to socket 0 core 3 thread 7

As expected, each MPI process is using a unique set of OS procs. However, the mapping from OS proc to physical thread is conflicting. It indicates that they are competing for the same hardware resource.  For example MPI process 0 maps OS proc 0 to (socket,core,thread) = (0,0,0) but MPI process 1 maps OS proc 4 to (0,0,0).

I have tried other combinations of MPI processes and OpenMP threads. Also I_MPI_PIN_DOMAIN=socket and I_MPI_PIN_ORDER=scatter gives the same conflicting mapping of OS proc to hardware resource.

Is there an error in how I start the program, or is it my interpretation of the KMP_AFFINITY information that is wrong?

0 Kudos
1 Reply
TobiasK
Moderator
45 Views

@Øyvind_Jensen please provide the output of I_MPI_DEBUG=10. Are you using Slurm or PBS or any other job management system?

0 Kudos
Reply