Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
7780 Discussions

Thread Affinity with low-level affinity interface & KMP_AFFINITY environment variable

remireins
Beginner
458 Views
Hi everyone,

I am programming in the Linux low-level affinity interface, but I have a doubt.

My architecture is a Intel Xeon CPU X3430 @ 2.40GHz. It has 4 cores within the same processor. (It isn't executing multi-threading).


Thereafter, I compile the program. I apply the following parameters on the shell:

export OMP_NUM_THREADS 1
export KMP_AFFINITY=verbose,granularity=thread,proclist=[0],explicit

In the code, I verify the affinity mask for each threads on all core
where:
"nCores" is the number of cores in the system.
"tId" is the identifier for each of the threads.
"mask" is the same mask for all threads.

for(int i=0;i {
if(kmp_get_affinity_mask_proc(i, &mask) == 0)
{
cout << "The thread " << tId << " is not executing on the core " << i << endl;
}
else
{
cout << "The thread " << tId << " is executing on the core " << i << endl;
}
}

and, on the screen it appears:

The thread 0 is executing on the core 0.
The thread 0 is not executing on the core 1.
The thread 0 is not executing on the core 2.
The thread 0 is not executing on the core 3.

And, it is correct, only the thread is executing on the core 0, and after I apply:

kmp_unset_affinity_mask_proc(0,&mask);

and, after I verify newly the affinity mask for each threads on all core with the same code above. After on the screen it appears:

The thread 0 is not executing on the core 0.
The thread 0 is not executing on the core 1.
The thread 0 is not executing on the core 2.
The thread 0 is not executing on the core 3.

But, I don't understand; if the thread 0 isn't executing in any core, why was the message showed on the screen if it has not been processed by any core?,

I would think my program would be paused in running time, but it is completed successfully.


P.S. I have run the same program on other arquitectures and I get the same results.

Can anyone help me with this doubt?


Regards,

Ramss Lpez,
0 Kudos
5 Replies
Feilong_H_Intel
Employee
458 Views
Hi Ramses,

If you can post or upload your test program, it would be much easier for other users and support engineers to reproduce and investigate the issue.

Thank you.
--
Feilong H.
Intel Developer Support

Tools Knowledge Base: http://software.intel.com/en-us/articles/tools
jimdempseyatthecove
Black Belt
458 Views
The affinity mask is a bit mask. When any bits are set, then those bits are the preferred logical processor(s) to run on. This mask does not specifically tell you which thread you are running on. I think when all mask bits are 0 is a special case which implies any processor is valid (as opposed to no processors are valid). Depending on your system, different behavior may occure when your bit mask is not zero and when none of the set bits reference an existing logical processor (i.e. may behave as if no bits are set in the bit mask).

To get the current processor you will need to use one of the sched_... functions (which I don't have handy right now).

Jim Dempsey
remireins
Beginner
458 Views
I realized a little program that shows this behaviour.

#########################################################################################################

#include
#include

using namespace std;

int main(void)
{

int nCores; // Number of cores in the system.
int tId; // Thread identifier.
int numThreads; // Number of threads inside of the parallel region.

#pragma omp parallel
{

numThreads = omp_get_num_threads();

cout << "Number of threads in the parallel region: " << numThreads << endl;

kmp_affinity_mask_t mask; // Thread Affinity Mask.

nCores = kmp_get_affinity_max_proc(); // Gets number of cores in the system.
cout << "Number of cores in the system: " << nCores << endl;

kmp_create_affinity_mask(&mask); // Creates a affinity mask.

tId = omp_get_thread_num(); // Obtains thread identifier.

// Add cores to the affinity mask
kmp_set_affinity_mask_proc(0, &mask);
kmp_set_affinity_mask_proc(1, &mask);
kmp_set_affinity_mask_proc(2, &mask);
kmp_set_affinity_mask_proc(3, &mask);

#####
##
## In this part the thread must is executing among all cores.
##
#####

// Verify the affinity mask for each threads on all cores
for(int i=0;i {
if(kmp_get_affinity_mask_proc(i, &mask) == 0)
{
cout << "The thread " << tId << " is not executing on the core " << i << endl;
}
else
{
cout << "The thread " << tId << " is executing on the core " << i << endl;
}
}

// Remove cores to the affinity mask
kmp_unset_affinity_mask_proc(0,&mask);
kmp_unset_affinity_mask_proc(1,&mask);
kmp_unset_affinity_mask_proc(2,&mask);
kmp_unset_affinity_mask_proc(3,&mask);


#####
##
## In this part the thread must is not executing among all cores.
##
#####


// Verify the affinity mask for each threads on all cores
for(int i=0;i {
if(kmp_get_affinity_mask_proc(i, &mask) == 0)
{
cout << "The thread " << tId << " is not executing on the core " << i << endl;
}
else
{
cout << "The thread " << tId << " is executing on the core " << i << endl;
}
}

kmp_destroy_affinity_mask(&mask);

}

return 0;

}

#########################################################################################################

I show the output of a run.

[ramses@cluster2]$ export OMP_NUM_THREADS=1
[ramses@cluster2]$ icpc -openmp -o arch.out Arquitectura.cpp
[ramses@cluster2]$ ./arch.out
Number of threads in the parallel region: 1
Number of cores in the system: 4
The thread 0 is executing on the core 0
The thread 0 is executing on the core 1
The thread 0 is executing on the core 2
The thread 0 is executing on the core 3
The thread 0 is not executing on the core 0
The thread 0 is not executing on the core 1
The thread 0 is not executing on the core 2
The thread 0 is not executing on the core 3

##########################################################################################################

Thanks for theirs answers,

I still am looking for and implementing some sched_... functions to get the current core and to know what cores share the execution of the thread... When I can do it I will reply this Thread.


P.S. Jim wrote: "I think when all mask bits are 0 is a special case which implies any processor is valid (as opposed to no processors are valid)." <- I think this is the most reasonable as well.


Regards,
jimdempseyatthecove
Black Belt
458 Views
The ...get_affinity_mask does not tell you what CPU number you are running on.
This gives you the mask (list of) CPUs that you request to restrict your thread to run on.
Note, you can set a mask with more than one bit (e.g. you want to run on cpus 0 or 2 but not any others)
Also, this is a request, usually granted, but not always granted. An example of this is the system has 8 logical processors (0:7), when your process starts the sysadmin or other circumstance says the process is restricted to processors (4:7). If you ask the system for number of system processors you get 0:7, if you ask the system for the process affinity mask you get 4:7. Depening on what you use you may think you have access to 0:7 and try to schedule a thread to, say, number 2. 2 is not a permissible CPU number for your process. The request will get denied (you may or may not see an error) and your thread will be permitted to float within the process default affinity mask.

To get the CPU number your thread is currently running

in sched.h

int sched_getcpu(void);

.OR.

in getcpu.h

int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);

Also, your program has a latent bug.

The available CPU numbers are not necessarily a contiguous series of bits starting with bit 0 in the mask.
And the available CPUs can change on some systems (hot swap processors). Your little test for desktop is likely ok. However, a desktop system could concievably be 2 processors (sockets) each with 3 or 6 cores (non-power of 2) and _potentially_ number the CPUs with gaps ({0,1,2,4,5,6} or{0,1,2,3,4,5,8,9,1011,12,13}). Generally the CPU numbers are sequential, but you are not assured of this. Try to future-proof your code (a few minutes of attention now will save you hours or days later).

Jim Dempsey

remireins
Beginner
458 Views
Thank you for your answer Jim, it has been very helpful for me.
I think this Thread can be closed.
Thanks and regards,
Ramss Lpez,
Reply