Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2469 Discussions

task_scheduler_observer behavior change after upgrade

jefffaust
New Contributor I
970 Views

We have recently upgraded from 2019.3 to 2021.6.0, and I'm investigating something that got 3x slower after the upgrade. To work with a 3rd party library, we need to initialize thread-specific data for each new TBB thread. We are using task_scheduler_observer for that. Prior to the upgrade, on_scheduler_entry was called once per core, or 10 times in my case. After the upgrade, it is called over 3000 times for the case I'm investigating. This causes the 3rd party library to have to rebuild the cache many times per thread, leading to the slowdown.

 

I'm not sure what the right question is, but one of these might be it:

  • Is this change in behavior expected?
  • Is this the right way to hook into thread creation for thread-specific setup?
  • Should I be looking into task_arena to achieve this?

Thank you,

 

Jeff

0 Kudos
6 Replies
NoorjahanSk_Intel
Moderator
919 Views

Hi,


Thanks for reaching out to us.


Could you please provide us with a sample reproducer and the steps you have followed to reproduce the issue so that we can try it from our end?

Please let us know how you are measuring the performance of your code.

Also please provide the OS details.


Thanks & Regards, 

Noorjahan


0 Kudos
jefffaust
New Contributor I
903 Views

Hi Noorjahan,

 

In lieu of a reproducible example, I think I can provide enough details. This is happening in parallel_pipeline, and it happens throughout the execution of the pipeline. Previously, we would get one on_scheduler_entry per thread. It's just a guess, but I think now we are getting one per task. Based on the name "task_scheduler_observer", this is arguably how it is supposed to work. It just happens to be a breaking change for us.

We measure the performance with a regression test suite on a dedicated computer using wall time, and track performance trends over time. Then we use VTune to diagnose problems when a test shows an issue.

 

Thank you,

-Jeff

0 Kudos
NoorjahanSk_Intel
Moderator
878 Views

Hi,


You can use task_arena to achieve this as the observer will only receive callbacks for threads that enter and exit that specific arena.

Please refer to the ProTBB Textbook, page no:359


Thanks & Regards,

Noorjahan.


0 Kudos
NoorjahanSk_Intel
Moderator
826 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Noorjahan.


0 Kudos
NoorjahanSk_Intel
Moderator
749 Views

Hi,


I have not heard back from you, so I will close this inquiry now. If you need further assistance, please post a new question.


Thanks & Regards,

Noorjahan.


0 Kudos
jefffaust
New Contributor I
697 Views

Hi Noorjahan,

 

Sorry, I was on vacation and then caught up with other priorities. This does not solve my problem. I will follow up with a new question with example code.

 

-Jeff

0 Kudos
Reply