Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

OpenMP detects only a single core on multi-socket/multi-core system

seongyun_k_
Beginner
1,154 Views

I program in C++ and uses MPI/OpenMP for parallelization. The machine has 2 CPU sockets and 8 cores per each socket.
The program additionally uses c++11 threads. Their affinity is controlled by 'pthread_setaffinity_np' calls.

I compile with intel compiler (16.0.1), I set the following environment variables

export I_MPI_PERHOST=1
export I_MPI_FABRICS=tcp
export I_MPI_FALLBACK=0
export I_MPI_PIN=1
export I_MPI_PIN_DOMAIN=node
export OMP_NUM_THREADS=16
export KMP_AFFINITY=verbose,scatter

With verbose option, I can see the following messages when running the binary.

    [0] OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
    [0] OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    [0] OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0}
    [0] OMP: Info #156: KMP_AFFINITY: 1 available OS procs
    [0] OMP: Info #157: KMP_AFFINITY: Uniform topology
    [0] OMP: Info #159: KMP_AFFINITY: 1 packages x 1 cores/pkg x 1 threads/core (1 total cores)
    [0] OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
    [0] OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 0 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 14 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 15 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 11 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 6 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 7 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 8 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 9 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 10 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 13 bound to OS proc set {0}
    [0] OMP: Info #242: KMP_AFFINITY: pid 12759 thread 12 bound to OS proc set {0}
    

As you can see, OMP cannot detect the correct number of packages (sockets) and cores per package. As a result, all the threads are pinned to a single core.

How can I resolve this issue? Where should I start?

0 Kudos
3 Replies
seongyun_k_
Beginner
1,154 Views

I answer my own question.

My program sets the CPU affinity of the main thread as follows:

...

CPU_ZERO(&cpuset);
CPU_SET(0, &cpuset);
pid_t tid = (pid_t) syscall(SYS_gettid);
sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset);

unsigned long mask = -1;
int rc = sched_getaffinity(tid, sizeof(unsigned long), (cpu_set_t*) &mask);
if (rc != 0) {
  std::cout << "ERROR calling pthread_setaffinity_np; " << rc << std::endl;
  abort();
}

...

The OpenMP threads spawned after the setaffinitiy syscall are all bound to the same core that the main thread is bound.
So I removed that part of code

 
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,154 Views

When you use MPI, and place multiple ranks on the same system (SMP or single CPU), then each rank will start as a separate process...
*** each with subset of the bits set in the process affinity mask ***

IOW, on the 2 socket system, with 2 MPI ranks (processes), one of the ranks (processes) will get half the system available processors (bit positions), and the other rank will get the other half. For example:

rank 0 may get bit positions 0:15 of the cpuset
rank 1 may get bit positions 16:31 of the cpuset

The actual bit positions in the CPU set will vary depending "factors" (environment variables and/or BIOS setting)

Do not assume each rank(process) affinity is 0:nn

On start of each rank(process), use sched_getaffinity(...) to obtain the rank(process) affinity mask. Then within the rank, pin your ancillary threads to any of the subset of bits, or multiple subset bits, of the affinities presented to your process.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
1,154 Views
>>...Their affinity is controlled by 'pthread_setaffinity_np' calls... Linux is a highly portable operating system and it is not clear why pthread_setaffinity_np ( _np stands for Non Portable ) was introduced. I see that sched_setaffinity function ( Portable ) is used in an updated code ( Post #2 ).
0 Kudos
Reply