Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Andrey_Vladimirov
New Contributor III
282 Views

Hard-wire 1 thread/core into an OpenMP application

I would like to to hard-wire the number of threads per core (along with affinity) into an executable using OpenMP, so that 1 thread per core would be used

  1. Regardless of how many physical cores the processor has
  2. Regardless of whether hyper-threading is enabled or not
  3. Regardless of whether the user has set OMP_NUM_THREADS or not

Intel C++ compiler supports -qopt-threads-per-core for the Intel MIC architecture. It would have been the ideal solution for me, but for some reason this argument is not supported for general-purpose CPU architecture.

So is there a way to hard-wire 1 thread/core in an application for a general-purpose CPU that uses OpenMP?

0 Kudos
14 Replies
TimP
Black Belt
282 Views

Probably not the answer you are looking for.  Your application could choose among utilities such as /proc/cpuinfo (Windows emulation available), cpuinfo which comes with Intel MPI (and may require Intel CPU), or hwloc which is used with openmpi, and set num_threads and omp_places accordingly.

MKL does almost what you ask, but the code may be proprietary.

Kittur_G_Intel
Employee
282 Views

Thanks Tim.   Andrey, I am checking with the OpenMP runtime group and will update you as soon as I get a confirmation/response and will update you accordingly.

Thanks,
Kittur

Andrey_Vladimirov
New Contributor III
282 Views

I am sure that something fundamental like this should have a simple and portable solution.

Parsing system parameters could work, but it is not simple and probably not very portable. By the way, along these lines, somebody recently pointed me to "lscpu". Like "cpuinfo", it reports human-readable data, but does not require MPI - it is a part of standard Linux distributions.

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               1200.000
BogoMIPS:              5186.75
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

 

Kittur_G_Intel
Employee
282 Views

Thanks Andrey, yes I agree. I'll update you as soon as I sync up with the runtime group and let you know - thanks.

Kittur

Kittur_G_Intel
Employee
282 Views

Hi Andrey,
Try the below and see if it works for you:

--------------------------------------
1) you can set OMP_NUM_TREADS=<num_of_CPUs>, and OMP_PROC_BIND=<spread>

2) Or you can
et the relevant environment variables immediately. Setting places to cores should get you as many places as there are cores, and from that you can the number of cores with omp_get_num_places (from 4.5 spec, so you’d need 16.0.3 compiler at minimum).  Then set the num threads to num cores.  Then the proc_bind strategy spreads the num_cores threads over the num_cores places, so you get exactly one thread per core.
int main(int argc, char* argv[])
{
 
 setenv("OMP_PLACES","cores", 1);
  
setenv("OMP_PROC_BIND","spread", 1);
  
int n_cores = omp_get_num_places();
  
omp_set_num_threads(n_cores);
  
#pragma omp parallel
    
printf("Thread %d\n", omp_get_thread_num());
  
 return 0;
}

Let me know if the above helps

Regards,
Kittur

 

 

Kittur_G_Intel
Employee
282 Views

Andrey, BTW, option 2 I indicated in my earlier response should work everywhere without having to set environment variables before executing the application.

Also, you can try out the KMP_PLACE_THREADS (also called now KMP_HW_SUBSET) which allows you to set the process affinity mask in advance. You can for example say KMP_PLACE_THREADS=1t, which limits the resources to just one thread per core. And then you can set OMP_PROC_BIND=spread and OMP_PLACES=threads or cores to do the affinity binding. That way, you don't need to figure out the number of cores and restrict the number of threads to it accordingly.

Regards,
Kittur

Kittur_G_Intel
Employee
282 Views

Andrey, any update? Did the solutions I suggested in my previous communication resolves your issue?

Kittur

TimP
Black Belt
282 Views

According to the program posted at http://stackoverflow.com/questions/1304363/how-to-check-the-version-of-openmp-on-linux the _OPENMP macro in icl 16.0.3 hasn't been updated to enable omp 4.5 support in programs which check the macro.

#include <unordered_map>
#include <cstdio>
#include <omp.h>

int
main (int argc, char *argv[])
{
  std::unordered_map < unsigned, std::string > map
  {
    {
    200505, "2.5"},
    {
    200805, "3.0"},
    {
    201107, "3.1"},
    {
    201307, "4.0"},
    {
  201511, "4.5"}};
  printf ("We have OpenMP %s.\n", map.at (_OPENMP).c_str ());
  return 0;
}

:We have OpenMP 4.0.

Incidentally, recent posts indicate OpenMP 4.5 function syntax acceptance having been checked into gcc/gfortran.  That doesn't mean they will be acted upon.  Even the OpenMP 4.0 support is more complete in linux than Windows.

TimP
Black Belt
282 Views

VS2015.2 accepts the program but produces no output, regardless of whether ICL is on PATH, in which case it complains about Intel headers.  Apparently, _OPENMP is defined only in the Intel headers, but those don't pass into CL in spite of the warning.   So we have to assume OpenMP 2.5 for ICL, regardless of whether we link libiomp5.

g++ 7 reports 4.5, so I think the program works.  Have my doubts about the full implementation.

Kittur_G_Intel
Employee
282 Views

Thanks Tim, understood and I've passed your feedback to the team as well.

Kittur

Andrey_Vladimirov
New Contributor III
282 Views

Kittur,

your solution #2 sounds like what I needed, but it does not work with Intel C++ compiler 16.0.1.150 on Linux:

#include <cstdio>
#include <cstdlib>
#include <omp.h>
int main(int argc, char* argv[])
{
  setenv("OMP_PLACES","cores", 1);
  setenv("OMP_PROC_BIND","spread", 1);
  int n_cores = omp_get_num_places();
  omp_set_num_threads(n_cores);
   #pragma omp parallel
  printf("Thread %d\n", omp_get_thread_num());
  return 0;
}
[avladim@alma-ata omptest]$ icpc -qopenmp foo.cc 
foo.cc(8): error: identifier "omp_get_num_places" is undefined
    int n_cores = omp_get_num_places();
                  ^

compilation aborted for foo.cc (code 2)

 

What can I do to fix this?

Andrey

Andrey_Vladimirov
New Contributor III
282 Views

This does not work because KMP_PLACE_THREADS is only supported for Intel Xeon Phi architecture. I need a solution for a multi-core CPU.

Kittur Ganesh (Intel) wrote:

Andrey, BTW, option 2 I indicated in my earlier response should work everywhere without having to set environment variables before executing the application.

Also, you can try out the KMP_PLACE_THREADS (also called now KMP_HW_SUBSET) which allows you to set the process affinity mask in advance. You can for example say KMP_PLACE_THREADS=1t, which limits the resources to just one thread per core. And then you can set OMP_PROC_BIND=spread and OMP_PLACES=threads or cores to do the affinity binding. That way, you don't need to figure out the number of cores and restrict the number of threads to it accordingly.

Regards,
Kittur

Kittur_G_Intel
Employee
282 Views

Hi Andrey,
Yes, that error is expected as the omp_get_num_places is from 4.5 spec and which is supported from 16.0 update 3 release onwards. 
So, you'll need to download the 16.0 update 3 from the Intel registration center (its' already released) and the code snippet should work.
I tried as well and it works fine with 16.0 update 3 release.

%icc -V
Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 16.0.3.210 Build 20160415

Regards,
Kittur


 

Kittur_G_Intel
Employee
282 Views

Andrey, BTW KMP_PLACE_THREADS was extended to work everywhere from ICC version 16.0 update 2 onwards. So, if you try out with 16.0 update 3 (mentioned in my earlier communication) this should work as well.

Thanks,
Kittur

 

Reply