The Intel® oneAPI Threading Building Blocks (oneTBB) library is an open-source library available on GitHub that supports parallelism on CPUs. It provides a standard C++ template-based extension to the C++ standard that enables advanced task and code parallelism together with data parallelism. As it is C++ standards-based, it is fully composable and can coexist with other parallel programming models. It allows you to define the parallel execution blocks for your workload from the ground up in such a way that they scale well with an increase or decrease in the number of available cores or execution units.
See the video: Introduction to oneTBB: A Modern C++ Library for Parallelism on CPUs
oneTBB was introduced in 2021 as an improved and modernized version of the original open-source Threading Building Blocks product that has been available and under continuous development for over 15 years. It has been widely adopted across several industries and research institutions. The oneTBB source distribution on GitHub supports multiple platforms such as x86, ARM, and MIPS, among others. As it is an open-source project targeted at assisting with advanced parallelism on multiple platforms, it comes as no surprise that it is also part of the oneAPI open programing model initiative specification.
As the name implies, oneTBB can be visualized as a set of “building blocks” that are categorized by their various functionalities and levels of system interactions.
Broadly speaking, the blocks can be categorized in two ways: on left of the diagram the blocks that contribute to the parallelism aspects (parallel execution interfaces) and on the right are the tools that can assist developers in their multi-threaded applications but are not dependent on the execution model on the left.
Some key components to understand about the makeup of oneTBB are as follows:
- Task Scheduler: The task scheduler is the heart of oneTBB, responsible for managing and distributing tasks across available threads. It dynamically balances the workload among threads, making efficient use of available processor resources.
- Task Arena: A task arena is a container for tasks. It provides a way to isolate the tasks created within the arena from tasks outside the arena, offering better control over the task scheduling and execution.
- Parallel Algorithms: oneTBB includes parallel implementations of many common algorithms (e.g., parallel for, parallel_reduce, parallel_sort) that allow developers to take advantage of parallelism without having to manually manage threads.
- Concurrent Containers: oneTBB provides thread-safe data structures such queues, hash tables, vectors that can be used in multi-threaded applications.
- Flow Graph: oneTBB's flow graph is an extension that enables the creation of complex dataflow networks to model and execute parallel computations with data dependencies.
In other words, oneTBB is thread-safe; it dynamically balances the workload across threads with the integrated task scheduler; parallel tasks live in independent containers that can be instantiated independently in different namespaces. Essentially, if you live and breathe C++ and object-oriented programming, oneTBB is going to be a very natural extension of C++ for these reasons.
So why use oneTBB to express parallelism? Because it provides a composable way to access CPU resources, meaning it overcomes efficiency issues often faced by many other co-existing multi-threaded components in a program. oneTBB allows developers to widely express parallelism and remain efficient. Many common parallel algorithms are already available to use, so no explicit management of pthreads is required. They can nest parallelism within parallelism without bogging down the system by oversubscribing the hardware (e.g., oneTBB will use tasks and task scheduler in a single thread pool). oneTBB will avoid greedily consuming resources while idle and thus never oversubscribe your system. Thus, oneTBB is an ideal choice to be used in the oneAPI stack.
Benefits of oneTBB over TBB
oneTBB uses a more modernized C++ standard (at least C++ 11), which includes features such as deduction guides and constraints, among others. Furthermore, since only oneTBB will be using newer C++ standards, future enhancements and functionality will only apply to oneTBB from now on and not TBB, and that includes all bug fixes and community contributions. Of course, future binary releases will only include oneTBB and no longer include TBB.
With oneTBB, performance benefits of modern C++ become apparent as well. Specifically, the compiler can now optimize atomics (and other synchronization primitives), for example, as it is now already wrapped in as part of the C++ standard. oneTBB also is more portable in that it supports non-x86 platforms (std::atomic, weak memory models, etc.). By utilizing modern C++ standards, oneTBB also avoids unspecified multithreading behaviors prevalent in earlier C++ standards. That is, in earlier C++, threading wasn’t even a concept in C++, so applications and libraries may have had unexpected behaviors as a result. Thread sanitizer support (and other sanitizers) are enabled out-of-the-box as well by C++ to quickly detect and prevent data race conditions from developing.
Examples of New Functionality for oneTBB
Some of the new features introduced with oneTBB, that were not available with its predecessor include:
- Support for new platforms, such as HW, OSes, and WASM, among others
- task_arena interface extensions for NUMA and hybrid cores to specify underlying hardware on which task_arena will be executed (e.g., the specific core type if running on hybrid cores).
- oneTBB thread pool termination (tbb::task_scheduler_handle) is important as it enables the ability to wait until all the threads have fully completed their tasks, thus preventing runtime errors, namely for libraries that can be unloaded on another space during runtime.
- Resumable tasks (tbb::suspend, tbb::resume) can express concurrent dependencies or perhaps assign dependencies using these coroutine-like resumable tasks.
- Adaptive mutexes (tbb::mutex, tbb::rw_mutex)
- Lazy initialization (tbb::collaborative_call_once)
- task_group extensions (tbb::task_handle) basically emulate advanced techniques from a task API that was previously deprecated in TBB, such as bypassing callbacks.
- tbb::concurrent_[map, set] refers to new containers that are free and usable concurrently.
Migration to oneTBB
Not only have features been added to oneTBB, but there are other types of changes and removals in oneTBB compared to TBB. In some cases, due to some APIs being removed to improve overall performance, source code may require some alterations. Another removal was the Tasks block, as it was previously only deprecated in TBB, and since it was a complicated and often error prone feature, it has now been fully removed in oneTBB.
But the vast majority of algorithms and containers require no source code changes when migrating from TBB to oneTBB; simply recompiling the application or library should be enough. For example, basic parallel constructs, such as parallel_for, parallel_reduce, task_group, and flow_graph, have a very similar API, so recompiling the application will do the trick. It is worth noting that in rare instances where an application cannot be fully migrated to oneTBB, it is still safe to use TBB and oneTBB in the same application. For more details on migrating your application to oneTBB, please refer to the official Migration Guide. We also encourage you to check out Intel’s other AI Tools and Framework optimizations and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio.
Add Scalable Parallel Execution to your C++ Application today!
Download the Intel® oneAPI Threading Blocks (oneAPI) binary distribution standalone or as part of the Intel® oneAPI Toolkits.
Download it in source and contribute to its developer community on GitHub.
About our Speakers
Pavel Kumbrasev
Middleware Engineer
Intel
Pavel is a Middleware engineer in the Software and Advanced Technologies Group at Intel and is a lead developer of the oneAPI Threading Building Blocks library (oneTBB). He played a key role in the modernization of the TBB during the creation of the oneTBB library.
Michael Voss
Principal Engineer
Intel
Mike Voss is a Principal Engineer in the Software and Advanced Technologies Group at Intel and is the oneAPI Threading Building Blocks (oneTBB) architect. Mike has coauthored over 40 published papers and articles on topics related to parallel programming, is the co-author of the book “Pro TBB: C++ Parallel Programming with Threading Building Blocks,” and is actively involved as one of Intel’s representatives to the ISO C++ Committee on topics related to concurrency and parallelism.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.