Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
2401 Discussions

Using more concurrency than set by task_scheduler_init in task_arena

e4lam
Beginner
477 Views

Hi,

I'm running into a situation where the application creates a single global task_scheduler_init object with a low concurrency limit, say 1. I've confirmed that through out the application, no other task_scheduler_init objects are created. Then at some later point in time, a task_arena is created with a larger value of max concurrency, say 8 (on a computer with 8 logical cores).

Is it expected that something like a parallel_for() inside this arena will now use up to 8 threads instead of being bounded by the global task_scheduler_init object's max of 1? That seems to be happening for us when I expected that the task_arena max concurrency would be bounded to the global task_scheduler_init setting instead.

Thanks!

0 Kudos
4 Replies
GouthamK_Intel
Moderator
451 Views

Hi,

Thanks for reaching out to us!

We are forwarding this thread to the concerned internal team who will guide you further.

Have a Good day!


Thanks & Regards

Goutham


Aleksei_F_Intel
Employee
444 Views

Hi e4lam,

Could you please tell whether "task_arena::enqueue()", that is, fire-and-forget tasks are used anywhere in the code with "task_scheduler_init" initialized with concurrency one?

Also, please consider switching to oneTBB.

Regards,

Aleksei

 

e4lam
Beginner
436 Views

Hi Aleksei,

Thanks for the pointer! As far as I can tell, there are no calls to "task_arena::enqueue()" but there might be calls to the non-task_arena version of "task::enqueue()". The application itself makes no calls to enqueue at all, and I think but the issue is arising out of our use of the USD library. The closest that I can find at the moment (in all of USD) is this line: detachedTask.h#L62 . And the line that I think we're hitting with the task_arena is here. The WorkGetConcurrencyLimit() call there is the one returning 8 for example. All we know is that if we manually change this function to always return 1 (ie. the same value as what we used for task_scheduler_init on the application side), then we get no extraneous parallelism. Otherwise, tasks look like they are either dispatched using the methods in that file or in dispatcher.h. Both of these explicitly spawn tbb::task's.

Since the docs are very unclear at the moment as to how all these max concurrency controls are applied, I wanted to reach out first to figure out how such situations can arise. I count 3 (or more?) places to do this currently: task_arena, task_scheduler_init, and global_control.

How does oneTBB differ from the regular TBB in this aspect? Unfortunately, we're bound by the VFX Platform which is still set to use TBB 2020 for this year (and yes, I know we're 2 years behind).

Thanks!

 

 

 

e4lam
Beginner
434 Views

PS. There are other forms for parallel work done I suspect within the arena's eg. loops.h and reduce.h .

Reply