Re: Using more concurrency than set by task_scheduler_init in task_arena

e4lam · ‎01-13-2021

Hi,

I'm running into a situation where the application creates a single global task_scheduler_init object with a low concurrency limit, say 1. I've confirmed that through out the application, no other task_scheduler_init objects are created. Then at some later point in time, a task_arena is created with a larger value of max concurrency, say 8 (on a computer with 8 logical cores).

Is it expected that something like a parallel_for() inside this arena will now use up to 8 threads instead of being bounded by the global task_scheduler_init object's max of 1? That seems to be happening for us when I expected that the task_arena max concurrency would be bounded to the global task_scheduler_init setting instead.

Thanks!

GouthamK_Intel · ‎01-17-2021

Hi,

Thanks for reaching out to us!

We are forwarding this thread to the concerned internal team who will guide you further.

Have a Good day!

Thanks & Regards

Goutham

Aleksei_F_Intel · ‎01-18-2021

Hi e4lam,

Could you please tell whether "task_arena::enqueue()", that is, fire-and-forget tasks are used anywhere in the code with "task_scheduler_init" initialized with concurrency one?

Also, please consider switching to oneTBB.

Regards,

Aleksei

e4lam · ‎01-18-2021

Hi Aleksei,

Thanks for the pointer! As far as I can tell, there are no calls to "task_arena::enqueue()" but there might be calls to the non-task_arena version of "task::enqueue()". The application itself makes no calls to enqueue at all, and I think but the issue is arising out of our use of the USD library. The closest that I can find at the moment (in all of USD) is this line: detachedTask.h#L62 . And the line that I think we're hitting with the task_arena is here. The WorkGetConcurrencyLimit() call there is the one returning 8 for example. All we know is that if we manually change this function to always return 1 (ie. the same value as what we used for task_scheduler_init on the application side), then we get no extraneous parallelism. Otherwise, tasks look like they are either dispatched using the methods in that file or in dispatcher.h. Both of these explicitly spawn tbb::task's.

Since the docs are very unclear at the moment as to how all these max concurrency controls are applied, I wanted to reach out first to figure out how such situations can arise. I count 3 (or more?) places to do this currently: task_arena, task_scheduler_init, and global_control.

How does oneTBB differ from the regular TBB in this aspect? Unfortunately, we're bound by the VFX Platform which is still set to use TBB 2020 for this year (and yes, I know we're 2 years behind).

Thanks!

e4lam · ‎01-18-2021

PS. There are other forms for parallel work done I suspect within the arena's eg. loops.h and reduce.h .