i'm working on a (16 processor x 4 core x 2 HT) linux server. the processor isXeone7420
I use pthread_setaffinity_np to assign a pthread to a processor.
the problem is:when I assign 8threads to1 processor.
the performance is thesame as 1 thread to 1 processor.
I wonder,the NTPLdoesnt support coreorHT affinity?
or th NTPL only supportaffinity on processor level.
The bits in this mask represent hardware threads on your system.
Assign 8 softare threads to 8 of those 128 bits.
Depending on your code, it may be benificial to congrigate the 8 bits within 1 physical processor (4x2 logical processors) (bit stride of 1), or spread to one logical processor per physical processoror (bit stride of 8)one logical processor per NUMA node (assuming you have NUMA capability) (bit stride of 8/6?).
By assigining 8 threads to one bit in the bitmask, all threads are using one hardware thread on the system (same run time as serial code + thread start/stop overhead).
Actually, part of the problem with using native POSIX threads is that the bit positions in the bitmask are not necessarily in any particular order when compared to the actual processor/memory topology. Just because bits are close in the mask doesn't mean that they are "close" in the sense of sharing a cache.
The easiest way to bind threads to hardware threads (HyperThreading technology), cores, or processors (sockets) is to use the KMP_AFFINITY interface provided in the OpenMP Run-time library shipped with any of the Intel Compilers. For full details, see the following link:
And search for the page called "Thread Affinity Interface".
Interestingly, you don't even need to have an OpenMP program to use this interface, it will work just as well with native POSIX threads. Here are the steps to follow:
setenv KMP_AFFINITY "verbose,compact,
where level=0 to bind each thread to successive HT thread contexts, or
level=1 to bind to successive cores, or
level=2 to bind to successive sockets.
Insert a call to "omp_get_num_procs()" in each POSIX thread you create at the point in the code where you want it to bind. This just makes sure the OpenMP RTL sees the POSIX thread and binds it according toKMP_AFFINITYsettings. Note that threads will be bound round-robin fashion in the order they call this function.
Finally, compile your codeusing the Intel C/C++ Compiler using"icc -openmp"and the usual argumentsand your POSIX threads will be bound to processors. Also, the "verbose" option to KMP_AFFINITY will print out a map of all the HT thread contexts, cores, processors and their relationships to each other with respect to memory proximity as well as their bit positions in affinity mask.
Hope this helps.