Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Library Design using TBB Philosophy

DBS4261
Beginner
674 Views

Hi folks, I've been working with an open source project that uses a combination of OpenMP and TBB for parallelism. I have been developing my own library that sits on top of it that I have also designed for parallelism. Recently I was working on adding some Embree based functions to it so I started looking at shifting from OpenMP to TBB and I have a couple of questions on design philosophy.

 

Firstly, in some of the example code, I saw that TBB works well with parallelized functions calling other parallelized functions. TBB would work though the functions in a depth first manner preventing fracturing in memory usage. I was wonder if this applies through static and/or dynamically linked libraries. Essentially, if my library has a tbb::parallel_for that calls a function in another library, and that function also has a tbb::parallel_for, will the depth first execution order be maintained?

 

Secondly, my impression from reading the documentation is that the tbb::global_control object is setting values globally, and therefore I assumed it would have static storage duration. This does not seem to be the case as setting the the maximum concurrency inside a scope does not persist after the scope ends. However I don't see a way to have a static tbb::global_control object, as the constructor is what is used to set the parameters. Is there a better way to set the maximum concurrency for the scope of the entire executable?

 

Thirdly, and I understand that this might be outside the scope of this forum, when I call functions that use OpenMP for parallelism inside a tbb parallel function, the runtime just vomits out threads. It seems that each TBB thread is able to create as many threads as the OpenMP maximum concurrency. Is there a way to fix this behavior, or do I need to just spend the time to port all of the OpenMP parallelized functions to use TBB?

 

Last, I am using Embree for some functions in my library. How do I have Embree use the same tbb context as the rest of my application? Specifically I am thinking that each thread of my application will be assembling work that eventually might need to go through a section accelerated by Embree. I would want the threads to prioritize the chunks created by Embree so that I can deallocate that scene as soon as possible.

0 Kudos
2 Replies
Mark_L_Intel
Moderator
650 Views

May I ask which open-source library are you referring to? In general, we don’t recommend mixing OpenMP and oneTBB since that would forfeit the composability advantages of oneTBB.  

 

Indeed, oneTBB has been designed with the nested parallelism in mind. Hence, if you have designed an application in a way so tbb:parallel_for in one part of the application is calling yet another tbb::parallel_for in one of your libraries, this scenario should work without oversubscription. The answer in this Stack overflow article, c++ - Parallel more than one nested loops with tbb - Stack Overflow, with an illustrative example that might help too. 

 

Regarding how oneTBB scheduler works, please see  How Task Scheduler Works — oneTBB documentation (oneapi-src.github.io). It follows certain algorithmic rules, and one of them is “The overall effect of rule 2 is to execute the youngest task spawned by the thread, which causes the depth-first execution until the thread runs out of work.” However, there are rules 1 and 3 too so it's more complex that jsut rule 2.  I would recommend freely available (Pro TBB: C++ Parallel Programming with Threading Building Blocks | SpringerLink) proTBB book for further reading on this subject.  

 

The tbb::global_control is a control variable (not static var): global_control — oneAPI Specification 1.3-rev-1 documentation.  If you want to set the max concurrency for the scope of the entire executable, you need to create a global_control object:

#include <oneapi/tbb/global_control.h>

int main() {
    // Set the max parallelism 
    oneapi::tbb::global_control gc(oneapi::tbb::global_control::max_allowed_parallelism, 16);

    // ...

    // The gc object will be destroyed
    return 0;
}

However, sometimes  you may need to set the tbb::global_control for some portion of the application. It is all described in more detail in one of chapters of proTBB book mentioned earlier, Controlling the Number of Threads Used for Execution | SpringerLink. Just be careful since this book was written at the time of the older TBB version of the library and some APIs have been deprecated.  Another good reference on the subject is Migration_Guide.    

DBS4261
Beginner
641 Views

That is a very well sourced response. I really appreciate it. I think the sharp edges I've found with TBB are where it differs in philosophy from standard libraries, like the global control not having static lifetime and blocked ranges not providing true iterators. So the extra info really is helping me better understand how to use TBB. The library I have traced my thread vomit issue to is Open3D. It seems to have relied on OpenMP for for a long time, but newer code tends to be based on TBB. This seems to stem from using the same functions for Cuda compilation and changing as little as possible with macros. I wanted to get some more information here first as I am fairly new to TBB before diving in and creating such a large PR.

0 Kudos
Reply