Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

NUMA Affinity

jonathandekock
Beginner
547 Views
I have been working with TBB for the past 6 months and have found it to be very helpful. The project I'm working on tends to allocate some data and then spawn offa taskto work on that data while the original task(s) carry on allocating more data and spawning off more tasks untileach original taskruns out of data to create. Each successive task will spawn off another job to do the next Phase of operation on that data similar to a pipeline model of data processing. Naturally these tasks are commonly stolen from other worker threads, and TBB is doing a great job of that, keeping all the cores busy doing useful work. What appears to happen in a NUMA system is that the thread working on the job is quite random relative to the NUMA node that originally allocated the data. I have tried using the task affinity_id information to suggest to TBB which thread might do well with this data, but it has little (no) impact on the final run-time.My analysis indicatesthis is because so many of the tasks are stolen that very few of them are run where they hinted to TBB might be a useful place to run them.

I am wondering if there are any plans for TBB to have a similar ability to task affinity_id to have some sort ofNUMA affinity_id. That way, in theory, I could pass along the NUMA affinity_id along with my data, and TBB would be more likely to run a task on the same NUMA node that allocated the data originally. I suspect this could provide an extra 10 to 15% performance boost relative to the current random work stealingassociations that I'm getting.

Thanks,

Jonathan
0 Kudos
1 Reply
RafSchietekat
Valued Contributor III
547 Views
"Each successive task will spawn off another job to do the next Phase of operation on that data similar to a pipeline model of data processing."
Please be careful wiith your assumptions: in a TBB pipeline, each data item is led from input to output by a single task.
0 Kudos
Reply