Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2425 Discussions

task_scheduler_init and processor-affinity

Is there any way to create N distinct thread pools for tbb? We would like to use the processor affinity support added in 3.0 to restrict the threads in tbb to a particular processorso we can have threads only process records stored in their NUMA node. I see how to restrict tbb to do this for a single processor but that leaves 3 other processors idle.
0 Kudos
2 Replies
Black Belt
I think that TBB has thread affinity for tasks, not processor affinity for threads, so if threads are moved about (which shouldn't happen often) they take "their" tasks with them. If you want to run a TBB process on a single NUMA node and the O.S. doesn't arrange that for you based on the number of threads in use (I don't know if that could be expected or even be a good thing to do), use task_scheduler_observer to tell the O.S. what you want.
So far, we don't have direct support to do what you want; but we plan to add certain things in order to make it easier.

The very first thing is already done; it's so-called master isolation: the work spawned by different user threads (masters) is never mixed. In a sense, it creates distinct thread pools (though worker threads can migrate from one master to another).

The next thing we plan to add is so-called "local observer": a mechanism similar to task_scheduler_observer but with callbacks issued for a worker thread when it joins and leaves a work sharing substrate (an arena) associated with a certain master. The current global semantics of task_scheduler_observer does not provide any clue about arenas, so it's now hard (if not impossible) to set affinity in the desired way.

When this is ready, you will need to somehow define the number of separate NUMA nodes on the system, create a separate user thread for each node, and in each thread specify task_scheduler_init for the right number of threads (presumably calculated as default_num_threads()/num_of_nodes). Each user thread will also create an instance of a local observer, which methods will set affinity the way it should be.