- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using MPI and OpenMP on a single node with 4 CPUs. Each CPU has 18 cores. I am trying to analyze the performance of my application by launching varying combinations of MPI processes and OpenMP threads. What confuses me, is that KMP_AFFINITY information indicates that each MPI-process gets pinned to exactly the same hardware threads.
I start my program like this:
export OMP_NUM_THREADS=8
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_ORDER=compact
export KMP_AFFINITY=verbose
mpirun --ppn 2 --np 2 ./my_exe
stderr of MPI process 0:
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3,72-75
OMP: Info #217: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 8 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #288: KMP_AFFINITY: topology layer "NUMA domain" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #288: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 4 cores/socket x 2 threads/core (4 total cores)
OMP: Info #219: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 72 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 1 thread 2
OMP: Info #172: KMP_AFFINITY: OS proc 73 maps to socket 0 core 1 thread 3
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 thread 4
OMP: Info #172: KMP_AFFINITY: OS proc 74 maps to socket 0 core 2 thread 5
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 thread 6
OMP: Info #172: KMP_AFFINITY: OS proc 75 maps to socket 0 core 3 thread 7
stderr of MPI process 1:
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 4-7,76-79
OMP: Info #217: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #157: KMP_AFFINITY: 8 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #288: KMP_AFFINITY: topology layer "NUMA domain" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #288: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #288: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #192: KMP_AFFINITY: 1 socket x 4 cores/socket x 2 threads/core (4 total cores)
OMP: Info #219: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 4 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 76 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 5 maps to socket 0 core 1 thread 2
OMP: Info #172: KMP_AFFINITY: OS proc 77 maps to socket 0 core 1 thread 3
OMP: Info #172: KMP_AFFINITY: OS proc 6 maps to socket 0 core 2 thread 4
OMP: Info #172: KMP_AFFINITY: OS proc 78 maps to socket 0 core 2 thread 5
OMP: Info #172: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 6
OMP: Info #172: KMP_AFFINITY: OS proc 79 maps to socket 0 core 3 thread 7
As expected, each MPI process is using a unique set of OS procs. However, the mapping from OS proc to physical thread is conflicting. It indicates that they are competing for the same hardware resource. For example MPI process 0 maps OS proc 0 to (socket,core,thread) = (0,0,0) but MPI process 1 maps OS proc 4 to (0,0,0).
I have tried other combinations of MPI processes and OpenMP threads. Also I_MPI_PIN_DOMAIN=socket and I_MPI_PIN_ORDER=scatter gives the same conflicting mapping of OS proc to hardware resource.
Is there an error in how I start the program, or is it my interpretation of the KMP_AFFINITY information that is wrong?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Øyvind_Jensen please provide the output of I_MPI_DEBUG=10. Are you using Slurm or PBS or any other job management system?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page