Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

TBB_NUM_THREADS to control number of threads in TBB

rpsmic001
Beginner
9,615 Views
Does Threaded Buliding Blocks provide an environment variable to control the number of threads spawned? I presume a sensible name for such an environment variable woud be TBB_NUM_THREADS, similar to OMP_NUM_THREADS, and have not seen any reference to this in the documentation. I know that in general it is best to let TBB set the number of threads, and that they can be set explicitly by an early call to task_scheduler_init, but I think that providing an optional environment variable that sets the number of threads if present would be useful to many users.

As an example, a software library that I use has chosen to use threaded building blocks behind the scenes. This has worked well overall, but is leading to some problems on large SMP machines. On these machines users request a small number of CPUS from the scheduler but unless the number of threads is actively controlled TBB tries to initialize N_CPUS + 1 threads which in one case is 156 threads. The resulting program is inefficient and ignores the resorces allocated by the scheduler.

If TBB_NUM_THREADS is not available in the TBB itself, I have to implement similar functionality in code that I want to run on an SMP machine. A Google search on TBB_NUM_THREADS suggests that this is an approach others have taken already. Are there plans to provide an environment variable like this in future releases?
0 Kudos
1 Solution
ARCH_R_Intel
Employee
9,615 Views
The lack of an environment variable for controllilng the number of threads is a deliberate design decision in TBB. Experience with OpenMP indicated that some customers did not want users to be able to fiddle with their programs. Instead, we provided a programmatic interface via task_scheduler_init. I believe that .NET went down a similar philosophical path in providing programmatic interfaces for controlling the virtual machine instead of environment variables.

View solution in original post

0 Kudos
11 Replies
RafSchietekat
Valued Contributor III
9,615 Views
If I've understood correctly, recent TBB versions let you use a different number of threads from a user thread with its own task_scheduler_init, which is even better than a global override. Have a look at the documentation and some recent forum threads to see if that works for you, and then please summarise your findings here.
0 Kudos
rpsmic001
Beginner
9,615 Views
Thank you for your reply.

If you mean that I can control the number of threads by invoking task_scheduler_init (num_threads) sufficiently early in my application, then I am aware of this and it is the method I am currently using.

I am suggesting that as a new feature, the task_scheduler_init () method, that is often invoked internally, should check for an environment variable TBB_NUM_THREADS before determining how many threads to spawn via another method. This would not be a global override since it can be set differently in each environment. In fact, many queue mangers eg Torque and LoadLeveler allow users to setup the environment variables for each job submitted.

Overall providing an environment variable seems to be a good way to offer control over the number of threads. I presume it has been considered and would like to know if something like this is planned, or why it is not so good if it is not.

Thanks again.
0 Kudos
ARCH_R_Intel
Employee
9,616 Views
The lack of an environment variable for controllilng the number of threads is a deliberate design decision in TBB. Experience with OpenMP indicated that some customers did not want users to be able to fiddle with their programs. Instead, we provided a programmatic interface via task_scheduler_init. I believe that .NET went down a similar philosophical path in providing programmatic interfaces for controlling the virtual machine instead of environment variables.
0 Kudos
mgjf61
Beginner
4,925 Views

Taking away needed control from all customers to satisfy some customers who do not need it sounds a bit cynical. So let me explain why we need easy access to controlling the number of threads, ideally by an environment variable:

We (a small HPC team at a university) are running several moderately sized compute clusters with the Slurm resource manager. Our customers are dozens of domain scientists from multiple disciplines of science and engineering, running their own or preinstalled codes.

MKL with TBB threading nicely obeys resource restrictions set up by Slurm, which is a definite progress compared to OpenMP, where users had to set up OMP_NUM_THREADS consistent with Slurm resource requirements; by experience an error prone requirement. So in theory we would like to recommend our users to switch to TBB threading to make life easier with Slurm.

Problem is: Users need to test their setup interactively on the cluster's login node before submitting jobs to Slurm. And for this purpose I have so far found no way to prevent MKL/TBB from grabbing all cores on the machine (actually even all hardware-threads), which is clearly unacceptable on a shared multiuser machine. Previously, we had set OMP_NUM_THREADS to a reasonable default value in our login scripts to prevent programs from running amok in that way. Users who know what they are doing can change this setting.

Modifying existing programs to call TBB library routines is a no-go for two reasons:

  1. these changes would most likely interfere with the Slurm integration
  2. having domain scientists fiddle with their (often legacy Fortran) codes just to control CPU allocation for interactive use is something that will not happen for - hopefully - obvious reasons.

So at the moment I am afraid we will unfortunately have to continue recommending users to compile their programs with OpenMP threading.

0 Kudos
Alexey-Kukanov
Employee
9,615 Views
Is there a standard way to determine at run time how many cores are dedicated by a job dispatcher to the given process? We could take it into account when deciding on the default number of threads.

Also, the application developers can always provide their own environment variable and read its value before initializing TBB thread pool. That would be a "pay as you go" solution that keeps control in hands of the application programmer.
0 Kudos
jimdempseyatthecove
Honored Contributor III
9,615 Views
You can add your own getenv for TBB_NUM_THREADS and use the supplied value or default.

Jim
0 Kudos
rpsmic001
Beginner
9,615 Views
Thanks for all those answers. It is good to know that it has been considered and deliberately avoided, along with some reasons. It means that I can suggest options to the developers of the (finite element FEM) library I am using for allowing people using their library control of the number of threads knowing that it won't change in the next release.

Providing a getenv method in the FEM library to read an environment variable and set the number of threads is an option I will put forward. I think if we go this way the name TBB_NUM_THREADS should be avoided because it would give the impression that it is provided at the TBB library level and generally cause further confusion.
0 Kudos
Vladimir_P_1234567890
9,615 Views

Intel TBB 4.3 Update 5 introduced global_control class for application-wide control of allowed parallelism and thread stack size.

https://www.threadingbuildingblocks.org/docs/help/reference/appendices/community_preview_features/tbb_global_control.htm

--Vladimir

0 Kudos
RafSchietekat
Valued Contributor III
9,615 Views

What is the intended usage, why does it take this form (with a "selection" rule), and why is there no way to query the current selection?

I would suggest a shorter name, though: "max_num_threads" (instead of the mouthful "max_allowed_parallelism"), to go with "default_num_threads". You could also rename task_scheduler_init's parameter (now "max_threads") for consistency (no API change).

I would also want to come back to the original suggestion of an environment variable. Maybe it's true that developers don't want users to be able to "fiddle" with their programs (why not, exactly?), but it's also true that sometimes users of a shared server don't want others to hog the machine whenever they run an off-the-shelf program using TBB... and those others currently have no other way to avoid causing resentment (aka. to be polite) than to not run those programs, which isn't much of a solution compared to being able to just set TBB_MAX_NUM_THREADS (also shorter than TBB_MAX_ALLOWED_PARALLELISM), which would logically participate in the selection rule and could also limit market capacity (the latter would just be an invisible implementation matter, but might be relevant to reliably avoid overshooting the mark and then having to park the excess threads).

BTW, if there are multiple master threads, does TBB try to compensate for that by parking one or more idle worker threads?

0 Kudos
Alexey-Kukanov
Employee
9,615 Views

Raf, thank you for feedback.

The intended usage for global_control is at the top level of an application (e.g. main()) to limit the number of threads TBB can use, no matter what was specified in various program modules by task_scheduler_init or task_arena. One particular use case is to facilitate implementation of an application-specific environment variable to control the number of threads.

The selection rule is there to provide a limited form of composability in case more than one global_control object is activated at the same time. We think it's more composable than if a new setting always overrode the previous one.

To query the current selection, there is global_control::active_value() static method.

We discussed various names and decided that "max_allowed_parallelism" best describes the semantics of the setting, though of course it's subjective and the difference with e.g. max_num_threads is subtle. Thank you for bringing this up, we might reconsider the name later.

I still believe that providing the developers with a way to implement a max-threads environment variable (if they wish so) is better than doing it ourselves and leaving them no control. An environment variable recognized directly by TBB could cause undesirable effects for end users as well: being set for/by one application, it could inadvertently affect other TBB-based applications on a system (think of Windows where it's common to set an environment variable globally).

Resource distribution on shared servers is controlled by job managers, and those usually utilize taskset or similar utilities to run a program on a given subset of cores. TBB recognizes and respects the process affinity masks, thus job managers have a way to limit how much HW is available for a TBB based app.

TBB still does not try to compensate for multiple master threads running at the same time. 

0 Kudos
RafSchietekat
Valued Contributor III
9,616 Views

"To query the current selection, there is global_control::active_value() static method." Oops, I missed that...

"I still believe that providing the developers with a way to implement a max-threads environment variable (if they wish so) is better than doing it ourselves and leaving them no control." Call me old-fashioned (using first person to mean any user), but if I bought the computer, and I decided to run a particular program on it, shouldn't I also be in control?

"think of Windows where it's common to set an environment variable globally" And how exactly does that mean that nobody else can have nice things? What exactly would be the scenario where a program developer would be aware, in advance, of a need to restrain execution and could easily build in a program-specific setting (following the recommendation of TBB's newly adapted documentation), but decides to instead advise his users to use a global setting and ignore their inevitable complaints that this interferes with other programs? Besides, I probably even would want to (be able to) set this in my login profile so I would have to explicitly disable it in a shell or for a specific program if I wanted to use the whole machine (for a specific need, or during quiet time): I would want to use this as an "environment" variable. It's not unlikely that a server's administrator would want to have a say in the matter as well, by setting it up as a default for all users.

"Resource distribution on shared servers is controlled by job managers" This isn't (only) about scheduled jobs: maybe I have an account on a server that I'm sharing with others, and I'm already using "-j 8" instead of "-j" to restrict my use of parallelism to build programs because I don't want to keep having lunch by myself all the time. Using taskset(1) is awkward because it involves some kind of partitioning that has to be coordinated with others, which is annoying enough by itself if it only has to be done occasionally. And if a machine has 32 cores and 25 potential users, do I only get a single core, or how do I coordinate with others in real time so our respective subsets don't overlap, without also restricting who can use the machine at each particular time? Why couldn't I just let the O.S. figure out for itself how to allocate resources like it normally would? And on top of that I also have to remember to use this each and every time I start a program!

Parallel programs haven't made scheduling shared resources any easier, and there's no definitive solution in sight, but why withhold this as an obvious workaround?

0 Kudos
Reply