I answer my own question.
My program sets the CPU affinity of the main thread as follows:
...
CPU_ZERO(&cpuset);
CPU_SET(0, &cpuset);
pid_t tid = (pid_t) syscall(SYS_gettid);
sched_setaffinity(tid, sizeof(cpu_set_t), &cpuset);
unsigned long mask = -1;
int rc = sched_getaffinity(tid, sizeof(unsigned long), (cpu_set_t*) &mask);
if (rc != 0) {
std::cout << "ERROR calling pthread_setaffinity_np; " << rc << std::endl;
abort();
}
...
The OpenMP threads spawned after the setaffinitiy syscall are all bound to the same core that the main thread is bound.
So I removed that part of code